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Transmitting messages across noisy channels is an important practical 
problem. Coding theory provides explicit ways of ensuring that messages 
remain legible even in the presence of errors. Cryptography on the other 
hand, makes sure that messages remain unreadable — except to the intended 
recipient. These complementary techniques turn out to have jnuch in common 
mathematically. 
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being told of any corrections or possible improvements and might even part with a small 
reward to the first finder of particular errors. This document is written in LM]yX2e and 
stored in the file labelled ~twk/IIA/Codes .tex on emu in (I hope) read permitted form. 
My e-mail address is twkSdpmms. 

These notes are based on notes taken in the course of the previous lecturer Dr Pinch. 
Most of their virtues are his, most of their vices mine. Although the course makes use 
of one or two results from probability theory and a few more from algebra it is possible 
to follow the course successfully whilst taking these results on trust. There is a note on 
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1 What is an error correcting code? 

Originally codes were a device for making messages hard to read. The study 
of such codes and their successors is called cryptography and will form the 
subject of the last quarter of these notes. However in the 19th century the 
optical 1 and then the electrical telegraph made it possible to send messages 
speedily but at a price. That price might be specified as so much per word 
or so much per letter. Obviously it made sense to have books of ‘telegraph 
codes’ in which one five letter combination QWADR, say, meant ‘please book 
quiet room for two’ and another QWNDR meant ‘please book cheapest room 
for one’. Obviously, also, an error of one letter of a telegraph code could have 
unpleasant consequences. 

Today messages are usually sent in as binary sequences like 01110010 . . . , 
but the transmission of each digit still costs money. Because of this, messages 

J See The Count of Monte Christo and various Napoleonic sea stories. A statue to the 
inventor of the optical telegraph (semaphore) used to stand somewhere in Paris but seems 
to have disappeared. 
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are often ‘compressed’, that is shortened by removing redundant structure 2 
In recognition of this fact we shall assume that we are asked to consider a 
collection of m messages each of which is equally likely. 

Our model is the following. When the ‘source’ produces one of the m 
possible messages /p say, it is fed into a ‘coder’ which outputs a string c, of 
n binary digits. The string is then transmitted one digit at a time along a 
‘communication channel’. Each digit has probability p of being mi strati smit- 
ted (so that 0 becomes 1 or 1 becomes 0) independently of what happens 
to the other digits [0 < p < 1/2]. The transmitted message is then passed 
through a ‘decoder’ which either produces a message /ij (where we hope that 
j = i) or an error message and passes it on to the ‘receiver’. 

Exercise 1.1. Why do we not consider the case 1 > p > 1/2? What if 

p = 1/2? 

An obvious example is the transmission of data from a distant space probe 
where (at least in the early days) the coder had to be simple and robust but 
the decoder could be as complex as the designer wished. On the other hand 
the decoder in a home CD player must be cheap but the encoding system 
which produces the disc can be very expensive. 

For most of the time we shall concentrate our attention on a code C C 
{0, l} re consisting of the codewords c*. We say that C has size m = |C|. 
If m is large then we can carry a large number of possible messages (that 

2 In practice the situation is more complicated. Engineers distinguish between irre- 
versible ’lossy compression’ and reversible ’lossless compression’. For compact discs where 
bits are cheap the sound recorded can be reconstructed exactly. For digital sound broad- 
casting where bits are expensive the engineers make use of knowledge of the human au- 
ditory system (for example, the fact that we can not make out very soft noise in the 
presence of loud noises) to produce a result that might sound perfect (or nearly so) to us 
but which is in fact not. For mobile phones there can be greater loss of data because users 
do not demand anywhere close to perfection. For digital TV the situation is still more 
striking with reduction in data content from film to TV of anything up to a factor of 60. 
However medical and satellite pictures must be transmitted with no loss of data. Notice 
that lossless coding can be judged by absolute criteria but the merits of lossy coding can 
only be judged subjectively. 

In theory lossless compression should lead to a signal indistinguishable (from a statistical 
point of view) from a random signal. In practice this is only possible in certain applications. 
As an indication of the kind of problem involved consider TV pictures. If we know that 
what is going to be transmitted is ‘head and shoulders’ or ‘tennis matches’ or ‘cartoons’ it 
is possible to obtain extraordinary compression ratios by ‘tuning’ the compression method 
to the expected pictures but then changes from what is expected can be disastrous. At 
present digital TV encoders merely expect the picture to consist of blocks which move at 
nearly constant velocity remaining more or less unchanged from frame to frame. In this 
as in other applications we know that after compression the signal still has non-trivial 
statistical properties but we do not know enough about them to exploit this. 
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is we can carry more information) but as m increases it becomes harder to 
distinguish between different messages when errors occur. At one extreme, 
if m = 1, errors cause us no problems (since there is only one message) but 
no information is transmitted (since there is only one message). At the other 
extreme, if m = 2 n , we can transmit lots of messages but any error moves 
us from one codeword to another. We are led to the following rather natural 
de fi nition. 

Definition 1.2. The information rate of C is — — — . 

n 

Note that, since m < 2" the information rate is never greater than 1. 
Notice also that the values of the information rate when m = 1 and m = 2 n 
agree with what we might expect. 

How should our decoder work? We have assumed that all messages are 
equally likely and that errors are independent (this would not be true if, for 
example, errors occured in bursts 3 . Under these assumptions, a reasonable 
strategy for our decoder is to guess that the codeword sent is one which 
differs in the fewest places from the string of n binary digits received. Here 
and elsewhere the discussion can be illuminated by the simple notion of a 
Hamming distance. 

Definition 1.3. //x, y e {0, l} n we write 

n 

d(x,y) = Y \ x j ~ Vj\ 
i=i 

and call d(x,y) the Hamming distance between x and y. 

Lemma 1.4. The Hamming distance is a metric. 

We now do some very simple 1A probability. 

3 For the purposes of this course we note that this problem could be tackled by permut- 
ing the ‘bits’ of the message so that ‘burst are spread out’. In theory we could do better 
than this by using the statistical properties of such bursts. In practice this may not be 
possible. In the paradigm case of mobile phones, the properties of the transmission chan- 
nel are constantly changing and are not well understood. (Here the main restriction on 
interleaving is that it introduces time delays. One way round this is ‘frequency hopping’ in 
which several users constantly swap transmission channels ‘dividing bursts among users.) 
One desirable property of codes for mobile phone users is that they should ‘fail gracefully’, 
that is that as the error rate for the channel rises the error rate for the receiver should not 
suddenly explode. 
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Lemma 1.5. We work with coding and transmission scheme described above. 
Let c G C and x G {0, l} n . 

(i) If d( c,x) = r then 

Pr(x received given c sent) = p r (1 — p) n ~ r . 

(ii) 7/d(c,x) = r then 

Pr(c sent given x received ) = A(yf)p r (1 — p) n r % 

where A(x) does not depend on r or c. 

(Hi) If c' G C and d( c',x) > d( c,x) then 

Pr(c sent given x received) > Pr(c 7 sent given x received) 

with equality if and only if d( c 7 ,x) = d( c, x). 

The lemma just proved justifies our use, both explicit and implicit, through- 
out what follows of the so called maximum likelihood decoding rule. 

Definition 1.6. The maximum likelihood decoding rule states that a string 
xe{o,l}“ received by a decoder should be decoded as (one of) the codewords 
at the smallest Hamming distance from x. 

Notice that, although this decoding rule is mathematically attractive, it 
may be impractical if C is large and there is no way of finding the codeword 
at the smallest distance from a particular x without making complete search 
through all the members of C . 

2 Hamming’s breakthrough 

Although we have used simple probabilistic arguments to justify it, the max- 
imum likelihood decoding rule will enable us to avoid probabilistic consider- 
ations for the rest of the course and to concentrate on algebraic and combi- 
natorial considerations. The spirit of the course is exemplified in the next 
two definitions. 

Definition 2.1. We say that C is d error detecting if changing up to d digits 
in a codeword never produces another codeword. 

Definition 2.2. We say that C is e error correcting if knowing that a string 
of n binary digits differs from some codeword of C in at most e places we 
can deduce the codeword. 
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Here are two simple schemes. 

Repetition coding of length n. We take codewords of the form 

c = (c, c, c, . . . , c) 

with c = 0 or c = 1. The code C is n — 1 error detecting, and \{n — l)/2j 
error correcting. The maximum likelihood decoder chooses the symbol that 
occurs most often. (Here and elsewhere \_a\ is the largest integer N < a and 
[of| is the smallest integer M > a.) Unfortunately the information rate is 
1/n which is rather low' 4 . 

The paper tape code. Here and elsewhere it is convenient to give {0, 1} the 
structure of the field F 2 = Z 2 by using arithmetic modulo 2. The codewords 
have the form 

C (ci, ^2 1 ^3 1 ■ • • i (-n ) 

with ci, c 2 , ... , c n _ i freely chosen elements of F 2 and c n (the check digit) 
the element of F 2 which gives 

C 1 + c 2 + - - ' + c n 1 + = 0 . 

The resulting code C is 1 error detecting since, if x G P 2 ‘ is obtained from 
c G C by making a single error, we have 

x i T X ’2 T * * * T x r/ | -{- x n — 1. 

However it is not error correcting since, if 

X\ + X2 + ‘ ‘ ‘ + X n — i + X n = 1 , 

there are n codewords y with Hamming distance d(x, y) = 1. The informa- 
tion rate is (n — 1 )/n. Traditional paper tape had 8 places per line each of 
which could have a punched hole or not so n = 8. 

Exercise 2.3. Machines tend to communicate in binary strings so this course 
concentrates on binary alphabets with two symbols. There is no particu- 
lar difficulty in extending our ideas to alphabets with n symbols though, of 
course, some tricks will only work for particular values of n. If you look at 
the inner title page of almost any recent book you will find its International 
Standard Book Number (ISBN). The ISBN uses single digits selected from 0, 
1, ... , 8, 9 and X representing 10. Each ISBN consists of nine such digits 
Oi, (i2, • ■ • j o 9 followed by a single check digit Oi 0 chosen so that 

10oi -}- 9 o 2 -}- * * 4 T 2og T Oio = 0 mod 11. ( 1 ) 

4 Compare the chorus ‘Oh no John, no John, no John no’. 
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(In more sophisticated language our code C consists of those eleinents a G F )) 1 
such that ^ ,(11 — j)cij = 0./ 

(i) Find a couple of books 0 and check that (*) holds for their ISBNs 5 6 . 

(ii) Show that (*) will not work if you make a mistake in writing down 
one digit of an ISBN. 

(Hi) Show that (*) may fail to detect two errors. 

(iv) Show that (*) will not work if you interchange two adjacent digits. 
Errors of type (ii) and (iv) are the most common in typing 1 . In communi- 
cation between publishers and booksellers both sides are anxious that errors 
should be detected but would prefer the other side to query errors rather than 
to guess what the error might have been. 

Hamming had access to an early electronic computer but was low down 
in the priority list of users. He would submit his programs encoded on paper 
tape to run over the weekend but often he would have his tape returned 
on Monday because the machine had detected an error in the tape. ‘If the 
machine can detect an error’ he asked himself ‘why can the machine not 
correct it?’ and he came up with the following scheme. 

Hamming's original code. We work in F^. The codewords c are chosen to 
satisfy the three conditions. 


Ci + C3 + C5 + C7 = 0 
c 2 + C3 + Cft + C7 = 0 
c 4 + C5 + Cq + C7 = 0 . 

By inspection w T e may choose c 3 , eg, c$ and C7 freely and then C2 and C4 
are completely determined. The information rate is thus 4/7. 

Suppose that we receive the string x G F^. We form the syndrome 
(•zi • W n) G F2 given by 


Z% = X 1 + x z + x 5 + x 7 
z 2 = X 2 + X3 + x 6 + x 7 

Z4 = X4 + Xs + Xft + X7. 

If x is a codeword then ( z\ . : 2 . : t ) = (0,0,0). If c is a codeword and the 
Hamming distance d(x, c) = 1 then the place in which x differs from c is 
given by z 7 + 2 z 2 + 4^ 3 (using ordinary addition, not addition modulo 2) as 

5 In case of difficulty your college library may be of assistance. 

6 In fact, X is only used in the check digit place. 

'Thus the 1997-8 syllabus for this course contains the rather charming misprint of 
‘snydrome’ for ‘syndrome’. 
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may be easily checked using linearity and a case by case study of the seven 
binary sequences x containing one 1 and six Os. The Hamming code is thus 
1 error correcting. 

Exercise 2.4. Suppose we use eight hole tape with the standard paper tape 
code and the probability that an error occurs at a particular place on the tape 
(i.e. a hole occurs where it should not or fails to occur where it should) is 
ICC 4 . A program requires about 10 000 lines of tape (each line containing 
eight places) using the paper tape code. Using the Poisson approximation, 
direct calculation (possible with a hand calculator but really no advance on 
the Poisson method) or otherwise show that the probability that the tape will 
be accepted as error free by the decoder is less than .04%- 

Suppose now that we use the Hamming scheme (making no use of the last 
place in each line). Explain why the program requires about 17500 lines of 
tape but that any particular line will be correctly decoded with probability about 
1 — (21 x 10 8 ) and the probability that the entire program will be correctly 
decoded is better than 99.6%. 

Hamming’s scheme is easy to implement. It took a little time for his com- 
pany to realise what he had done 8 but they were soon trying to patent it. 
In retrospect the idea of an error correcting code seems obvious (Hamming’s 
scheme had actually been used as the basis of a Victorian party trick) and 
indeed two or three other people discovered it independently, but Hamming 
and his Co-discoverers had done more than find a clever answer to a ques- 
tion. They had asked an entirely new question and opened a new field for 
mathematics and engineering. 

The times were propitious for the development of the new T field. Be- 
fore 1940 error correcting codes would have been luxuries, solutions looking 
for problems, after 1950 w T ith the rise of the computer and new T communi- 
cation technologies they became necessities. Mathematicians and engineers 
returning from wartime duties in code breaking, code making and general 
communications problems were primed to grasp and extend the ideas. The 
mathematical engineer Claude Shannon may be considered the presiding ge- 
nius of the new field. 


3 General considerations 

How good can error correcting and error detecting codes be? The following 
discussion is a natural development of the ideas we have already discussed. 

Experienced engineers came away from working demonstrations muttering ‘I still don’t 
believe it’. 
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Definition 3.1. The minimum distance d of a code is the smallest Hamming 
distance between distinct code words. 

We call a code of length n, size m and distance d a [n, m, d] code. Less 
briefly, a set C C F^, with C = m and 

min{d(x, y ) : x, y G C, x^y} = d 

is called a [n, m , d] code. By an [n, m] code we shall simply mean a code of 
length n and size m. 

Lemma 3.2. A code of minimum distance d can detect d — 1 errors and 
correct errors. It cannot detect all sets of d errors and cannot correct 

all sets of L^J + 1 errors. 

It is natural here and elsewhere to make use of the geometrical insight 
provided by the (closed) Hamming ball 

H(x,r) = {y : d(x,y) < r}. 

Observe that 


|H( x ,r)| = |H(0,r)| 

for all x and so writing 

V(n,r) = |H(0,r)| 

we know that V(n, r) is the number of points in any Hamming ball of radius 
r. A simple counting argument shows that 

H(n, r) = • 

3=0 

Theorem 3.3 (Hamming’s bound). If a code C is e error correcting then 

2 n 

~ V(n, e 

There is an obvious fascination (if not utility) in the search for codes 
which attain the exact Hamming bound. 

Definition 3.4. A code C of length n and size m which can correct e errors 
is called perfect if 

2 n 

Vl = V{^e)' 
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Lemma 3.5. Hamming's original code is a [7, 16, 2] code. It is perfect. 

It may be worth remarking in this context that if a code which can correct 
e errors is perfect (i.e. has a perfect packing of Hamming balls of radius e 
then the decoder must invariably give the wrong answer when presented with 
e + 1 errors. We note also that if (as will usually be the case) 2 n /V(n,e) is 
not an integer no perfect e error correcting code can exist. 

Exercise 3.6. Even if2 n /V(n, e) is an integer, no perfect code may exist, 
(i) Verify that 


q90 

~ _ 9 78 

V (90, 2) “ ' 

(ii) Suppose that C is a perfect 2 error correcting code of length 90 and 
size 2‘ 8 . Explain why we may suppose without loss of generality that 0 G C. 
(Hi) Let C be as in (ii) with 0 G C. Consider the set 

X = {x G IFf 0 : xi = 1, x 2 = 1, :d(0, x) = 3}. 

Show that corresponding to each x G X we can find a unique c(x) G C such 
that d( c(x), x) = 2. 

(iv) Continuing with the argument of (in) show that 

d( c(x), 0) = 5 

and that q ( x ) = 1 whenever Xi = 1 . By looking at d(c(x), c(x 7 )) for x, x 7 G 
X and invoking the Dirichlet pigeon-hole principle, or otherwise, obtain a 
contradiction. 

(v) Conclude that there is no perfect [90, 2‘ 8 ] code. 

We obtained the Hamming bound which places an upper bound on how 
good a code can be by a packing argument. A covering argument gives us 
the GSV (Gilbert, Shannon, Varshamov) bound in the opposite direction. 
Let us write A(n, d) for the size of the largest code with minimum distance 
d. 

Theorem 3.7 (Gilbert, Shannon, Varshamov). We have 

2 n 

~ V(n,d — 



10 



Until recently there were no general explicit constructions for codes which 
achieved the GVS bound (i.e. codes whose minimum distance d satisfied the 
inequality A(n,d)V(n,d — 1) > 2 n ). Such a construction was finally found 
by Garcia and Stricheuth by using ‘Goppa’ codes. 

Engineers are, of course, interested in ‘best codes’ of length n for reason- 
ably small values of n but mathematicians are particularly interested in what 
happens as n oc. To see what we should look at recall the so called weak 
law of large numbers (a simple consequence of Chebychev’s inequality). In 
our case it yields the following result. 

Lemma 3.8. Consider the model of a noisy transmission channel used in 
this course in which each digit had probability p of being wrongly transmitted 
independently of what happens to the other digits. If e > 0 then 

Pr (number of errors in transmission for message of n digits > (1 + e)pn) — *■ 0 


as n 


oo. 


By Lemma 3.2 a code of minimum distance d can correct [^-J errors. 
Thus if we have an error rate p and e > 0 we know that the probability that 
a code of length n with error correcting capacity |~(1 + e)pn\ code will fail 
to correct a transmitted message falls to zero as n — > oo. By definition the 
biggest code with minimum distance [2(1 + e)pn] has size A(n, [2(1 + e)pn \ ) 
and so has information rate log 2 A(n, [2(1 + e)pn\ )/n. Study of the behaviour 
of log 2 A(n, nS)/n will thus tell us how large an information rate is possible 
in the presence of a given error rate. 


Definition 3.9. If 0 < <5 < 1/2 we write 


a (6) = limsup 

n— »oo 


log 2 A(n,nS) 
n 


Definition 3.10. We define the entropy function H : [0,1/2) — > M by 
H( 0) = 0 and 


H(S) = — <5 log 2 (<5) — (1 — <5) log 2 (l — <5), 
for all 0 < 8 < 1/2 

(Our function H is a very special case of a general measure of disorder.) 
Theorem 3.11. With the definitions just given , 


1 - H(6) < a(6) < 1 - 11(8/2) 

for all 0 < 8 < 1/2. 
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Using the Hamming bound (Theorem 3.3) and the GSV bound (Theo- 
rem 3.7) we see that Theorem 3.11 follows at once from the following result. 

Theorem 3.12. We have 

log 2 V(n, n6) _ ^ 


as n 


oo. 


Our proof of Theorem 3.12 depends, as one might expect on a version of 
Stirling’s formula. We only need the very simplest version proved in 1A. 

Lemma 3.13 (Stirling). We have 


log e n\ = n log f , n — n + 0( log 2 n). 
We combine this with the remark that 

V(n,nS)= 

0<j<n6 



and that very simple estimates give 




< (m + 1 ) 



where m = \_nS\ . 

Although the GSV bound is very important a stronger result can be 
obtained for the error correcting power of the best long codes. 

Theorem 3.14 (Shannon’s coding theorem). Suppose 0 < p < 1/2 and 

i) > 0. Then there exists an n 0 (p,r)) such that for any n > n 0 we can find 
codes of length n which have the property that (under our standard jnodel) the 
probability that a codeword is mistaken is less than e and have information 
rate 1 — H(p) — ij. 


[WARNING: Do not use this result until you have studied its proof. It is 
indeed a beautiful and powerful result but my statement conceals some traps 
for the unwary.] 

Thus in our standard setup, by using sufficiently long code words, we can 
simultaneously obtain an information rate as close to 1 — H (p) as we please 
and and an error rate as close to 0 as we please. Shannon’s proof uses the 
kind of ideas developed in this section with an extra pinch of probability (he 
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chooses codewords at random ) but I shall not give it in the course. There is 
a nice simple treatment in Chapter 3 of [9]. 

In view of Hamming’s bound it is not surprising that it can also be shown 
that w T e cannot drive the error rate down to close to zero and maintain an 
information rate 1 — Hq > 1 — H(p). To sum up, our standard set up, has 
capacity 1 — H(p). We can communicate reliably at any fixed information rate 
below' this capacity but not at any rate above. However, Shannon’s theorem 
which tells us that rates less than H(p ) are possible is non-constructive and 
does not tell us how' explicitly hoe to achieve these rates. 

4 Linear codes 

Just as R n is vector space over R and C n is vector space over C so FJ, 1 is 
vector space over the F 2 . (If you know about vector spaces over fields, so 
much the better, if not just follow' the obvious paths.) A linear code is a 
subspace of R? . More formally we have the following definition. 

Definition 4.1. A linear code is a subset of¥tt such that 

(i) 0 G C, 

(ii) ifx, y G C then x + y G C. 

Note that if A G F then A = 0 or A = 1 so that condition (i) of the 
definition just given guarantees that Ax G C whenever x G C. We shall see 
that linear codes have many useful properties. 

Example 4.2. (%) The repetition code with 

{C = {x : x = (.r, x, . . . .r)} 


is a linear code. 

( ii ) The paper tape code 


C = 



Xj = 0 


is a linear code. 

(Hi) Hamming's original code is a linear code. 

The verification is easy. In fact, examples (ii) and (iii) are ‘parity check 
codes’ and so automatically linear as we shall see from the next le m ma. 
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Definition 4.3. Consider a set P in F^. We say that C is the code defined 
by the set of parity checks P if the elements of C are precisely those x£F 2 ‘ 
with 

n 

£«• = 0 
3 = 1 

for all p G P . 

Lemma 4.4. If C is code defined by parity checks then C is linear. 

We now prove the converse result. 

Definition 4.5. If C is a linear code we write C 1 - for the set o/p 6 P such 
that 

n 

£ 1’C'-i 0 

3 = 1 

for all x G C. 

Thus C 1 - is the set of parity checks satisfied by C. 

Lemma 4.6. If C is a linear code then 

(i) C is a linear code, 

(ii) (C ,_L ) J " P C. 

We call the dual code to C. 

In the language of the last part of the course on linear mathematics (PI), 
C is the annihilator of C . The following is a standard theorem of that 
course. 

Lemma 4.7. If C is a linear code in FJ) then 

dim C + dim = n. 

Since the last part of PI is not the most popular piece of mathematics in 
IB we shall give an independent proof later (see the note after Lemma 4.13). 
Combining Lemma 4.6 (ii) with Lemma 4.7 w T e get the following corollaries. 

Lemma 4.8. If C is a linear code then (C' _L ) J_ = C. 

Lemma 4.9. Every linear code is defined by parity checks. 

Our treatment of linear codes has been rather abstract. In order to put 
computational flesh on the dry theoretical bones we introduce the notion of 
a generator matrix. 
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Definition 4.10. If C is a linear code of length n any r x n matrix whose 
rows form a basis for C is called a generator matrix for C. We say that C 
has dimension or rank r. 

Example 4.11. As examples we can find generator matrices for the repeti- 
tion code, the paper tape code and the original Hamming code. 

Remember that the Hamming code is the code of length 7 given by the 
parity conditions 


./•| I x.\ I .r 5 I x 7 0 

X ,2 + x 3 + x 6 + x 7 = 0 
x± + x 5 + x 6 + x- = 0. 

By using row operations and column permutations we can use Gaussian 
elimination we can give a constructive proof of the following lemma. 

Lemma 4.12. Any linear code of length n has (possibly after permuting the 
order of coordinates) a generator matrix of the form 

MB). 

Notice that this means that any codeword x can be written as 

(y|z) = (y|y£) 

where y = (y±, 7 / 2 , • - ■ , y r ) niay be considered as the message and the vector 
z = y B of length n — r may be considered the check digits. Any code whose 
codewords can be split up in this manner is called systematic. 

We now give a more computational treatment of parity checks. 

Lemma 4.13. IfC is a linear code of length n with generator matrix G then 
a G C 1 - if and only if 


Ga T = 0 T . 


Thus 


C ± = (ker G) T . 


Thus using the rank, nullity theorem we get a second proof of Lemma 4.7. 
Lemma 4.13 also enables us to characterise C ,± . 
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Lemma 4.14. If C is a linear code of length n and dimension r with gen- 
erator the r x n matrix G then if H is any n x n — r matrix with columns 
forming a basis of leer G we know that H is a parity check matrix for C and 
its transpose H T is a generator for C ± . 

Example 4.15. (i) The dual of the paper tape code is the repetition code. 

(ii) Hamming's original code has dual with generator 

/ 1 0 1 0 1 0 1 \ 

0 110 0 11 

\0 001111 / 

We saw above that the codewords of a linear code can be written 

(y|z) = (y|y B) 

where y may be considered as the vector of message digits and z = y B as the 
vector of check digits. Thus encoders for linear codes are easy to construct. 

What about decoders? Recall that every linear code of length n has a 
(non-unique) associated parity check matrix H with the property that x6C 
if and only if xH = 0. If z G we define the syndrome of z to be zH. The 
following lemma is mathematically trivial but forms the basis of the method 
of syndrome decoding. 

Lemma 4.16. Let C be a linear code with parity check matrix H. If we are 
given z = x + e where x is a code word and the ‘error vector’ e € then 

z H = eH. 

Suppose we have tabulated the syndrome uH for all u with ‘few’ non-zero 
entries (say, all u with d(u, 0) < K). If our decoder receives z it computes 
the syndrome zH. If the syndrome is zero then z G C and the decoder 
assumes the transmitted message was z. If the syndrome of the received 
message is a non-zero vector w the decoder searches its list until it finds an 
e with eH = w. The decoder then assumes that the transmitted message 
was x = z — e (note that z — e will always be a codeword, even if not the 
right one). This procedure wall fail if w does not appear in the list but for 
this to be case at least K + 1 errors must have occured. 

If we take K = 1, that is w T e only want a 1 error correcting code then 
writing for the vector in Wf with 1 in the zth place and 0 elsewhere we 
see that the syndrome e^H is the zth row of H. If the transmitted message 
z has syndrome zH equal to the zth row of H then the decoder assumes that 
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there has been an error in the zth place and nowhere else. (Recall the special 
case of Hamming’s original code.) 

If K is large the task of searching the list of possible syndromes becomes 
onerous and, unless (as sometimes happens) we can find another trick, w T e 
find that ‘decoding becomes dear’ although ‘encoding remains cheap’. 

We conclude this section by looking at weights and the weight enumera- 
tion polynomial for a linear code. The idea here is to exploit the fact that if 
C is linear code and a G C then a + C = C. Thus the ‘view of C from any 
codeword a is the same as the ‘view of C' from the particular codeword 0. 

Definition 4.17. The weight w(x.) of a vector x G is given by 

w(x) = d( 0, x). 

Lemma 4.18. If w is the weight function on FJ, 1 and x,y GFJ then 

(i) w(x) > 0, 

(ii) wfx.) = 0 if and only ifx = 0, 

(Hi) w(x) + w( y) > w(x + y). 

Since the minimum (non-zero) weight in a linear code is the same as the 
minimum (non-zero) distance we can talk about linear codes of minimum 
weight d when we mean linear codes of minimum distance d. 

The pattern of distances in a linear code is encapsulated in the weight 
enumeration polynomial. 

Definition 4.19. Let C be a linear code of length n. We write Aj for the 
number of codewords of weight j and define the weight enumeration polyno- 
mial Wc to be the polynomial in two real variables given by 

n 

WcW.t) '• 

j = o 

Here are some simple properties of Wc- 

Lemma 4.20. Under the assumptions and with the notation of the Defini- 
tion 4-19, the following results are true. 

(i) Wc is a homogeneous polynomial of degree n. 

(ii) If C has rank r then Wc( 1, 1) = 2 r . 

(in) w c (o, i) = i. 

(iv) Wc(I,0) takes the value 0 or 1. 

(v) Wc(s,t) = Wc{t,s) for all s and t if and only ifWc( 1 , 0 ) = 1 . 
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Lemma 4.21. For our standard model of communication along an error 
prone channel with independent errors of probability p and a linear code C 
of length n, 

Wc(p, 1 — p) = Pr {receive a code word \ code word transmitted) 


and 

Pr (receive incorrect code word \ code word transmitted) = Wc(p, 1 — p) — (1 — p) n . 

Example 4.22. (i) If C is the repetition code, Wc(s,t) = s n + t n . 

(ii) IfC is the paper tape code of length n, Wcfs^t) = |((s+t)” + (t — s) n ). 

Example 4.22 is a special case of the Mac Williams identity. 

Theorem 4.23 (MacWilliams identity). If C is a linear code 
W c ±(s, t) = 2 - dimC W c (t - s,t + s). 

We shall not give a proof and even the result may be considered as starred. 

5 Some general constructions 

However interesting the theoretical study of codes may be to a pure mathe- 
matician, the engineer would prefer to have an arsenal of practical codes so 
that he or she can select the one most suitable for the job in hand. In this 
section we discuss the general Hamming codes and the Reed-Muller codes as 
well as some simple methods of obtaining new codes from old. 

Definition 5.1. Let d be a strictly positive integer and let n = 2 d — 1. Con- 
sider the (column) vector space D = Write down a d x n matrix H 

whose columns are the 2 d — 1 distinct non-zero vectors of D. The Hamming 
( n , n — d) code is the linear code of length n with H as parity check matrix. 

Of course the Hamming ( n , n — d) code is only defined up to permutation 
of coordinates. We note that H has rank d so a simple use of the rank nullity 
theorem shows that our notation is consistent. 

Lemma 5.2. The Harmning ( n , n — d) code is a linear code of length n and 
rank d [n = 2 d — l]. 

Example 5.3. The Hamming (7, 4) code is the original Hamming code. 
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The fact that any two rows of H are linearly independent and a look at the 
appropriate syndromes gives us the main property of the general Hamming 
code. 

Lemma 5.4. The Hamming (n, n — d) code has minimum weight 3 and is a 
perfect 1 error correcting code [n = 2 d — 1], 

Hamming codes are ideal in situations where very long strings of binary 
digits must be transmitted but the chance of an error in any individual digit 
is very small. (Look at Exercise 2.4.) It may be worth remarking that, apart 
from the Hamming codes there are only a few (and, in particular, a finite 
number) of examples of perfect codes known. 

Here are a number of simple tricks for creating new codes from old. 

Definition 5.5. If C is a code of length n the parity check extension C + of 
C is the code of length n + 1 given by 

{ n + 1 

X G F£ +1 : {x u x 2 , ■ ■ ■ ,tc n ) G C, x :j = 0 

3 = 1 

Definition 5.6. If C is a code of length n the truncation C of C is the 
code of length n — 1 given by 

C = {((^i,.r 2 , • • • ,z«-i) : (x if x 2 , ■ ■ -,x n ) G C for some x n G F 2 }. 

Definition 5.7. If C is a code of length n the shortening (or puncturing,) 
C of C is the code of length n — 1 given by 

C = {{{x x ,x 2 , ■ ■ . f x n - 1 ) : (x x ,x 2H . . . 0) G C}. 

Lemma 5.8. IfC is linear so is its parity check extension C + , its truncation 
C~ and its shortening C . 

How can we combine two linear codes C\ and C' 2 ? Our first thought might 
be to look at their direct sum 

C '2 = {(x|y) : x G C x , y G C 2 }, 

but this is unlikely to be satisfactory. 

Lemma 5.9. If C\ and C 2 are linear codes then we have the following rela- 
tion between minimum distances. 

d(C 1 ®C 2 ) = mm(d(C 1 ),d(C 2 )). 
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On the other hand if Cf and C 2 satisfy rather particular conditions we 
can obtain a more promising construction. 

Definition 5.10. Suppose C\ and C 2 are linear codes of length n with Cf D 
C 2 (i-e. with C 2 a subspace of C\). We define the bar product C\ \ C 2 of C\ 
and C 2 to be the code of length 2 n given by 

Ci\C 2 = {(x|x + y) : x e Cj, y G C 2 }. 

Lemma 5.11. Let C\ and C 2 be linear codes of length n with C\ C 2 . Then 
the bar product C\\C 2 is a linear code with 

rank C\ \ C 2 = rank Cf + rank C 2 . 

The minimum distance of C±\C 2 satisfies the inequality 

d{Ci\C 2 ) > min(2d(C 1 ), d(C 2 )). 

We now return to the construction of specific codes. Recall that the 
Hamming codes are suitable for situations when the error rate p is very 
small and we want a high information rate. The Reed-Muller are suitable 
when the error rate is very high and we are prepared to sacrifice information 
rate. They were used by NASA for the radio transmissions from its planetary 
probes (a task which has been compared to signalling across the Atlantic with 
a child’s torch 9 ). 

We start by considering the 2 d points P 0 , Pi, • • • , P 2 a-i of the space 
X = Fj . Our code words will be of length n = 2 d and will correspond to the 
indicator functions I a on X. More specifically the possible code word c A is 
given by 

cf = 1 if P G A 

cf = 0 otherwise. 


for some A CJ. 

In addition to the usual vector space structure on we define a new 
operation 

c 4 A c B = c AnB . 


Thus if x, y G F^, 

(x 0 ,x 1 , . . . a (y 0 ,yi**-.,y«-i) = (^oi/owwi, • • ■ ,x n -iy n -i)- 

9 Strictly speaking the comparison is meaningless. However, it sounds impressive and 
that is the main thing. 
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Finally we consider the collection of d hyperplanes 

TTj = {pe X : Pj = 0 [1 < j < d]} 
in 1FT, 1 and the corresponding indicator functions 

h j = c 7fj , 

together with the special vector 

h° = c A ' = (1,1,-.., 1). 

Exercise 5.12. Suppose that x,y,z and A , B Cl. 

(i) Show that x A y = y A x. 

(ii) Show that (x + y) Az = xAz + yAz. 

(in) Show that h°Ax = x. 

(iv) If c A + c B = c E find E in terms of A and B. 

(v) If h° + c A = c E find E in terms of A. 

We refer to Ao = {h 0 } as the set of terms of order zero. If Ak is the set 
of terms of order at most A: then the set Ak+i of terms of order at most A: + 1 
is defined by 


Ak+i = {a A h J : a e A k , 1 < j < n}. 

Less formally but more clearly the elements of order 1 are the h' . the elements 
of order 2 are the h' A h ? with i < j, the elements of order 3 are the h' A h- ? A h /,: 
with i<] < k and so on. 

Definition 5.13. Using the notation established above, the Reed- Miller code 
RM(d,r) is the linear code (i.e. subspace of F " ) generated by the terms of 
order r or less. 

Although the formal definition of the Reed-Miller codes looks pretty im- 
penetrable at first sight, once we have looked at sufficiently many examples 
it should become clear what is going on. 

Example 5.14. (i) The RM (3,0) code is the repetition code of length 8. 

(ii) The RM (3,1) code is the parity check extension of Hamming's orig- 
inal code. 

(Hi) The RM( 3, 2) code is the paper tape code of length 8. 

(in) The RM( 3,3) code is the trivial code consisting of all the elements 

o/J|. 
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We now prove the key properties of the Reed-Miller codes. We use the 
notation established above. 

Theorem 5.15. (i) The elements of order d or less (that is the collection of 
all possible wedge products formed from the h'j span F(j . 

(ii) The elements of order d or less are linearly independent. 

(Hi) The dimension of the Reed- Miller code RM(d,r) is 



(iv) Using the bar product notation we have 

RM(d , r) = RM(d - 1, r)\RM(d - 1, r - 1). 

(v) The minimum weight of RM(d,r) is exactly 2 d ~ r . 

Exercise 5.16. The Mariner mission to Mars used the RM(o, 1) code. What 
was its inforjnation rate. What proportion of errors could it correct in a sin- 
gle code word? 

Exercise 5.17. Show that the RM(d , d— 2) code is the parity extension code 
of the Hamming (N, N — d) code with N = 2 d — 1. 

6 Polynomials and fields 

This section is starred and will not be covered in lectures. Its object is 
to make plausible the few facts from modern 10 algebra that we shall need. 
They were covered, along with much else, in the course 04 (Groups, rings 
and fields) but attendance at that course is no more required for this course 
than is reading Joyce’s Ulysses before going for a night out at an Irish pub. 
Anyone capable of criticising the imprecision and general slackness of the 
account that follows obviously can do better themselves and should omit 
this section. 

A field K is an object equipped with addition and multiplication which 
follow the same rules as do addition and multiplication in R. The only rule 
which will cause us trouble is 

If x G K and x ^ 0 then we can find y G K such that xy = 1. ★ 

Obvious examples of fields include R, C and F 2 . 

We are particularly interested in polynomials over fields but here an in- 
teresting difficulty arises. 

10 Modern, that is, in 1850. 
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Example 6.1. We have t 2 +t = 0 for all t G F 2 . 

To get round this, we distinguish between the polynomial in the ‘indeter- 
minate’ X 

n 

pm = Y. a i x ’ 

j = o 

with coefficients in a.j G K and its evaluation P(t) = Y^j=o °:i ^ f° r some 
t G K. We manipulate polynomials in X according to the standard rules for 
polynomials but say that 

n 

E a i x ’ = 0 

3=0 

if and only if a.j = 0 for all j. Thus X 2 + X is a non-zero polynomial over 
F 2 all of whose values are zero. 

The following result is familiar, in essence, from school mathematics. 

Lemma 6.2 (Remainder theorem), (i) If P is a polynomial over a field 
K and a G K then we can find a polynomial Q and an r G K such that 

P(X) = (X - a)Q(X) + r. 

(%%) If P is a polynomial over a field K and a G K is such that P(a ) = 0 
then we can find a polynomial Q such that 

P(X) = (X - a)Q(X). 

The key to much of the elementary theory of polynomials lies in the fact 
that we can apply Euclid’s algorithm to obtain results like the following. 

Theorem 6.3. Suppose thatV is a set of polynomials, which contains at least 
one non-zero polynomial and has the following properties. 

(i) If Q is any polynomial and P G V then the product PQ G V . 

(ii) If Pi, P 2 G P then Pi + P 2 E V. 

Then we can find a non-zero Pq G P which divides every P G P . 

Proof. Consider a non zero polynomial P 0 of smallest degree in V. □ 

Recall that the polynomial P(X) = X 2 + 1 has no roots in R (that is 
P(t) 0 for all t G R). However by considering the collection of formal 
expressions a + bi [a, b G R] with the obvious formal definitions of addition 
and multiplication and subject to the further condition i 2 + 1 = 0 we obtain 
a held C D R in which P has a root (since P(i) = 0). We can perform a 
similar trick with other fields. 
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Example 6.4. If P(X) = X 2 + X + 1 then P has no roots in F 2 . However 
if we consider 


F 2 [cj] — {0, 1, uj, 1 + uj} 

with obvious formal definitions of addition and multiplication and subject to 
the further condition uj 2 + uj + 1 = 0 then F 2 [cj] is a field containing F 2 in 
which P has root (since P{uS) = 0). 

Proof The only thing we really need prove is that F 2 [cj] is a field and to do 
that the only thing we need to prove is that ★ holds. Since 

(1 + Uj)uj = 1 


this is easy. □ 

In order to state a correct generalisation of the ideas of the previous 
paragraph we need a preliminary definition. 

Definition 6.5. If P is a polynomial over a field K we say that P is re- 
ducible if there exists a non-constant polynomial Q of degree strictly less 
than P which divides P. If P is a non-constant polynomial which is not 
reducible then P is irreducible. 

Theorem 6.6. If P is an irreducible polynomial of degree n > 2 over a field 
K. then P has no roots in K. However if we consider 

K [ cj ] = < ^ aguo 3 : a ,j £ K 

[j=0 

with the obvious formal definitions of addition and multiplication and subject 
to the further condition P(lo) = 0 then K[uS\ is a field containing K in which 
P has root. 

Proof The only thing we really need prove is that K[uj\ is a field and to do 
that the only thing we need to prove is that ★ holds. Let Q be a non-zero 
polynomial of degree at most n — 1 . Since P is irreducible, the polynomials 
P and Q have no common factor of degree 1 or more. Hence, by Euclid’s 
algorithm we can find polynomials R and S such that 

R(X)Q(X) + S(X)P(X) = 1 

and so R(uj)Q(uj) + S(uj)P(uj) = 1. But P(w) = 0 so R(uj)Q(uj) = 1 and we 
have proved ★. □ 


24 



In a proper algebra course we would simply define 

A'M = A"[.Y]/(P(.Y)) 

where (P(A')) is the ideal generated by P(X). This is a cleaner procedure 
which avoids the use of such phrases as ‘the obvious formal definitions of 
addition and multiplication’ but the underlying idea remains the same. 

Lemma 6.7. If P is polynomial over a field K which does not factorise 
completely into linear factors then we can fi,nd a field L K in which P has 
more linear factors. 

Proof. Factor P into irreducible factors and choose a factor Q which is not 
linear. By Theorem 6.6 we can find a field L D I\ in which Q has a root a 

say and so by Lemma 6.2 a linear factor X — a. Since any linear factor of P 

in K remains a factor in the bigger field I we are done. □ 

Theorem 6.8. If P is polynomial over a fi,eld I\ then we can fi,nd a fi,eld 

L K in which P factorises completely into linear factors. 

We shall be interested in finite fields (that is fields K with only a finite 
number of elements). A glance at our method of proving Theorem 6.8 shows 
that the following result holds. 

Lemma 6.9. If P is polynomial over a finite fi,eld K then we can fi,nd a 
fi,nite fi,eld L D I\ in which P factorises completely. 

In this context, we note yet another useful simple consequence of Euclid’s 
algorithm. 

Lemma 6.10. Suppose that P is an irreducible polynomial over a field K 
which has a linear factor X — a in some field L D K . If Q is a polynomial 
over I\ which has the factor X — a in L then P divides Q. 

We shall need a lem m a on repeated roots. 

Lemma 6.11. Let K be field. If P(X ) = a : jX :i is a polynomial over 

I\ we define P'( X) = a jXP 

(i) If P andQ are polynomials (P+Q)' = P'+Q' and ( PQ )' = P'Q+PQ' . 
(H) If P and Q are polynomials with P(X ) = (A' — a) 2 Q( X) then 

P\ X) = 2(A - a)Q(X) + (A - a) 2 Q'(X). 

(in) If P is divisible by (X — a) 2 then P(a ) = P'(a) = 0. 
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If L is a field containing F 2 then 2 y = (1 + l)y = Oy = 0 for all y G L. We 
can thus deduce the following result which will be used in the next section. 

Lemma 6.12. If L is a field containing F 2 and n is an odd integer then 
X n — 1 can have no repeated linear factors as a polynomial over L. 

We also need a result on roots of unity given as part (v) of the next 
le m ma. 

Lemma 6.13. (i) If G is a finite Abelian group and x,y G G have coprime 
orders r and s then xy has order rs. 

(ii) If G is a finite Abelian group and x,y gG have orders r and s then 
we can find an element z of G with order the lowest common multiple of r 
and s. 

(in) If G is a finite Abelian group then there exists an N and an h G G 
such that h has order N and g N = e for all g G G. 

(iv) If G is a finite subset of a field I\ which is a group under multiplica- 
tion then G is cyclic. 

(v) Suppose n is an odd integer. If L is a field containing F 2 such that 
X n — 1 factorises completely into linear terms then we can find an uo G L 
such that the roots of X n — 1 are 1, a, a 2 , . . . cCW (We call a a primitive 
nth root of unity.) 

Proof, (ii) Consider z. = x u y v where u is a divisor of r , v is a divisor of s, 
r/u and s/v are coprime and rs/(uv ) = lcm (r, s). 

(iii) Let h be an element of highest order in G and use (ii). 

(iv) By (iii) we can find an integer N and a h G G such that h has order 
N and any element g G G satisfies g N = 1. Thus X N — 1 has a linear factor 
X — g for each g G G and so J([ G (Jf — g) divides X N — 1. It follows that 
the order |G| of G cannot exceed N. But by Lagrange’s theorem N divides 
G. Thus |G| = N and g generates G. 

(v) Observe that G = {lo : uo n = 1} is an Abelian group with exactly n 

elements (since X n — 1 has no repeated roots) and use (iv). □ 

Here is another interesting consequence of Lemma 6.13 (iv) 

Lemma 6.14. If K is a field with 2” elements containing F 2 then there is 
an element k of K such that 

K = {0} U {k r : 0 < r < 2 n - 2 


and A: 2 " 1 = 1. 

Proof. Observe that K \ {0} forms a group under multiplication. □ 
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With this hint it is not hard to show that there is indeed a field with 2" 
elements containing F 2 . 

Lemma 6.15. Let L be some field containing F 2 in which X 2 " 1 —1=0 
factorises completely. Then 

K = {x G L : x 2 " = } 

is a field with 2 n elements containing F 2 . 

Lemma 6.14 shows that that there is (up to field isomorphism) only one 
field with 2” elements containing F 2 . We call it F 2 « . We call an element k 
with the properties given in Lemma 6.14 a primitive element of F 2 « . 

7 Cyclic codes 

In this section we discuss a subclass of linear codes, the so called cyclic codes. 

Definition 7.1. A linear code C in F 2 is called cyclic if 

(oo, ... , a, n -2, o>n-i) £ C => (oi, a 2 , ... , a n _ i, ao) E C. 

Let us establish a correspondence between E? and the polynomials on F 2 
modulo X" — 1 by setting 

n 

p* = E% A ' ; 

j = o 

whenever a G F 2 . (Of course, X n — 1 = X n + 1 but in this context the first 
expression seems more natural.) 

Exercise 7.2. With the notation just established show that 

(i) P a + Pb = I f b' 

(ii) P a = 0 if and only if a = 0. 

Lemma 7.3. A code C in E, 1 is cyclic if and only if Vc = {P& '■ a G C } 
satisfi.es the following two conditions (working modulo X n — 1). 

(i) If fig £ Pc then f + g EV C . 

(ii) If f G Vc and g is any polynomial then the product fg G Vc- 

(In the language of abstract algebra, C is cyclic if and only if Vc is an ideal 
of the quotient ring F 2 [X]/ {X" — 1).) 

From now on we shall talk of the code word /( X) when we mean the code 
word a with P a (X) = f(X). An application of Euclid’s algorithm gives the 
following useful result. 
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Lemma 7.4. A code C of length n is cyclic if and only if (working modulo 
X n — 1 , and using the conventions established above ) there exists a polynomial 
g such that 


C = {f(X)g(X) : f a polynomial } 

(In the language of abstract algebra, F 2 [A'] is a Euclidean domain and so a 
principal ideal domain. Thus the quotient F 2 [A'] /(A" — 1) is a principal ideal 
domain.) We call g(X) a generator polynomial for C 

Lemma 7.5. A polynomial g is a generator for a cyclic code of length n if 
and only if it divides X n — 1 . 

Thus we must seek generators among the factors of A'" — 1 = A'" + 1. If 
there are no conditions on n the result can be rather disappointing. 

Exercise 7.6. If we work with polynomials over F 2 then 

A 2r + 1 = (A + 1 f. 

In order to avoid this problem and to be able to make use of Lemma 6.12 
we shall take n odd from now on. (In this case the cyclic codes are said to be 
separable.) Notice that the task of finding irreducible factors (that is factors 
with no further factorisation) is a finite one. 

Lemma 7.7. Consider codes of length n Suppose that g(X)h(X) = X n — 1. 
Then g is a generator of a cyclic code C and h is a generator for a cyclic 
code which is the reverse of C ± . 

As an immediate corollary we have the following remark. 

Lemma 7.8. The dual of a cyclic code is itself cyclic. 

Lemma 7.9. If a cyclic code C of length n has generator g of degree n — r 
then g( X), Xg(X), . . . , X r l g(X) for?n a basis for C. 

Cyclic codes are thus easy to specify (we just need to write down the 
generator polynomial g) and to encode. 

Example 7.10. There are three cyclic codes of length 7 corresponding to 
irreducible polynomials of which two are versions of Hamming's original code. 

We know that X n + 1 factorises completely over some larger finite field 
and, since n is odd, we know by Lemma 6.12 that it has no repeated factors. 
The same is therefore true for any polynomial dividing it. 
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Lemma 7.11. Suppose that g is a generator of a cyclic code C of odd length 
n. Suppose further that g factorises completely into linear factors in some 
field K containing F2 . If g = g\g 2 ■ ■ ■ gu with each gj irreducible over F2 and 
A is a set consisting only of the roots of the gj and containing at least one 
root of each g :j [1 < j < A:] , then 

C = {/ G F 2 [X] : f(a) = 0 for all a G A}. 

Definition 7.12. A defining set for a cyclic code C is a set A of elements 
in some field K containing F 2 such that f G F 2 [X] belongs to C if and only 
if f ( a ) = 0 f or a E A. 

(Note that, if C has length n, A must be a set of zeros of X n — 1.) 

Lemma 7.13. Suppose that 

A = {oi,a 2 , • • • ,Qv} 

is a defining set for a cyclic code C in sojne field K containing F 2 . Let B be 
the r x n matrix over I\ whose jth column is 

(1 

Then a vector a6l^ is a code word in C if and only if 

a B = 0 


in K. 

The columns in B are not parity checks in the usual sense since the code 
entries lie in F 2 and the computations take place in the larger field K. 

With this background we can discuss a famous family of codes known 
as the BCH (Bose, Ray-Chaudhuri, Hocquenghem) codes. Recall that a 
primitive nth root of unity is an root a of X" — 1 = 0 such that every root 
is a power of a 

Definition 7.14. Suppose that n is odd and I\ is a field containing F 2 in 
which X" — 1 factorises into linear factors. Suppose that a G K is a primitive 
nth root of unity. A cyclic code C with defining set 

A = {a, a 2 , . . . , a 6 - 1 } 

is a BCH code of design distance S. 
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Note that the rank of C will be n — k where k is the degree of the product 
those irreducible factors of X n — 1 over F which have a zero in .4. Notice 
also that k may be very much larger than 6. 

Example 7.15. (%) If K is a field containing F 2 then (. a + b ) 2 = a 2 + b 2 for 
all a , b G K . 

(ii) If P G F 2 [X] and K is a field containing F 2 then P(a ) 2 = P(a 2 ) for 
all a G K. 

(in) Let I\ be a field containing F 2 in which X 1 — 1 factorises into linear 
factors. If (5 is a root of X 3 +X + 1 in I\ then /3 is a primitive root of unity 
and (I 2 is also a root of X 3 + A" + 1. 

(iv) We continue with the notation (in). The BCH code with {/3,/3 2 } as 
defining set is Hamming's original (7,4) code. 

The next theorem contains the key fact about BCH codes. 

Theorem 7.16. The minimum distance for a BCH code is at least as great 
as the design distance. 

Our proof of Theorem 7.16 relies on showing that the matrix B of Lemma. 7.13 
is non-singular for a BCH. To do this we use a result which every undergrad- 
uate knew in 1950. 

Lemma 7.17 (The van der Monde determinant). We work over a field 
K . The determinant 


1 

1 

1 

1 


X\ 

X'2 


• • • X'n 


rr 2 
JLt -j^ 
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x 2 

r 2 

x 3 

■r 2 

• b n 

II 
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yT 
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1 <j<i<n 


n-1 

x 2 

ry-n-l 

x 3 

,y.n— 1 

... 



How can we construct a decoder for a BCH code? From now on until 
the end of this section we shall suppose that we are using the BCH code C 
described in Defi n ition 7.14. In particular C will have length n and defining 
set 


A = { a , a 2 , . . . , a 6 x } 

wdiere a is a primitive nth root of unity in K . Let t be the largest integer 
with 2t + 1 < 6. We show how we can correct up to t errors. 

Suppose that a codeword c = (r,,. cj. i) is transmitted and that 
the string received is r. We write e = r — c and assume that 

£ = {0 < j < n — 1 : ej 0 } 
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has no more than t members. In other words e is the error vector and we 
assume that there are no more than t errors. We write 

n— 1 

< x ) = E 

j = o 

n—1 

>'(A') = E ’>w 

j = o 

n—1 

e(.V) = E e W 

3=0 

Definition 7.18. The error locator polynomial is 

a(X) = J}(1 - a? X) 
j&£ 


and the error co-locator is 


n—1 

u,(X) = J2e.a' 1| (1 - a j X). 

i 0 j(- j ji 


Informally we write 


n—1 

“(A) = E 

i = 0 


€iO! 


■ A ) 

1 — ol 1 X 


We take oj(X) = ^ • ujjX^ and cr(X) = a : jX :i . Note that cj has degree at 
most t — 1 and cr degree at most t. Note that we know that cr 0 = 1 so both 
the polynomials lo and a have t unknown coefficients. 

Lemma 7.19. If the error location polynomial is given the value of e and 
so of c can be obtained directly. 

We wish to make use of relations of the form 


1 

1 - aK X 


E< tti A y. 

r = 0 


Unfortunately it is not clear what meaning to assign to such a relation. One 
way round is to work modulo Z 2t (more formally, to work in K[Z}/ (Z 21 )). 
We then have Z u = 0 for all integers u > 2 1. 
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Lemma 7.20. If we work modulo Z 2t then 

21—1 

(1 - a j Z )^ 2 ( a j Z) m = 1 . 

m— 0 

Thus, if we work modulo Z 2t , as we shall from now on, we may define 


1 

1 — a?Z 


2t—l 
m — 0 


Lemma 7.21. With the conventions already introduced. 

?)?.+ 1 \ 


( 2t - 1 

fi) ^ - £ z”«( 


a(Z) 


a 


( 71=0 


(ii) e(a m ) = r(a m ) for all 1 < rn < 2 1. 

1: m+l 1 

<t(Z) 


1 7\ 2t - 1 


a 


( 71=0 

(iv) u(Z) = YZll Z m r(a m+1 )a(Z). 

(v) ujj = r(a u+1 )a v for all 0 < j < 2t — 1. 

U+V=j 


(vi) 0 = r( y a u+1 )a v for all t < j < 2t — 1. 

u-\-v=j 

(vii) The conditions in (vi) determine a completely. 


Part (vi) of Lemma 7.21 completes our search for a decoding method, since 
a determines £, £ determines e and e determines c. It is worth noting that 
the system of equations in part (v) suffice to determine the pair a and oj 
directly. 

Compact disk players use BCH codes. Of course errors are likely to 
occur in bursts (corresponding to scratches etc) and this is dealt with by 
distributing the bits (digits) in a single codeword over a much longer stretch 
of track. The code used can correct a burst of 4000 consecutive errors (2.5 
mm of track). 

Unfortunately none of the codes we have considered work anywhere near 
the Shannon bound (see Theorem 3.14). We might suspect that this is be- 
cause they are linear but Elias has shown that this is not the case. (We just 
state the result without proof.) 


Theorem 7.22. In Theorem 3. If we can replace ‘code' by ‘linear code'. 


It is clear that much remains to be done. 
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Just as pure algebra has contributed greatly to the study of error correct- 
ing codes so the study of error correcting codes has contributed greatly to 
the study of pure algebra. The story of one such contribution is set out in 
T. M. Thompson’s From Error- correcting Codes through Sphere Packings to 
Simple Groups [8] — a good, not too mathematical, account of the discovery 
of the last sporadic simple groups by Conway and others. 

8 Shift registers 

In this section we move towards cryptography but the topic discussed will 
turn out to have connections with the decoding of BCH codes as well. 

Definition 8.1. A general feedback shift register is a map f : Fg — > F 2 
given by 

f ( X (b *^1 7 ■ • • 7 d 2 7 I'd 1 ) (iC \ , X‘2 , ■ ■ • , X d— 1 ^d— 2 7 ^'d 1 ) ) 

The stream associated to an initial fill (y 0 , y\. ... , yd-i ) is the sequence 

y 0 , 2 / 1 , , yj, y j+ 1 , . . . with y n = C(y n - d , y n -d+ 1 , • • ■ , y n - 1 ) for all n > d. 

Example 8.2. If the general feedback shift f given in Definition 8.1 is a 
permutation then C is linear in the first variable , i.e. 

C(x 0 ,x ll: . . . , X'd— 2 , X ‘ d—l ) X q T C ( X ] , . 7 ." 2 , . . . , X d— 2 , X d— 1 ) • 

Definition 8.3. We say that the function f of Definition 8.1 is a linear 
feedback register if 

C(x o, jc'i, • • • , x,d- 1) = a,QXd-\ + ad~ 2 X 1 • • • + Qo^d-i, 
with (id 1 = 1. 

Exercise 8.4. Discuss briefly the effect of omitting the condition ad - 1 = 1 
from Definition 8.3. 

The discussion of the linear recurrence 

x n x n —d T a.iX n _d—i T i^n — 1 

over F 2 follows the 1A discussion of the same problem over R but is compli- 
cated by the fact that 


n 2 = n 
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ill F 2 . We assume that a 0 fi 0 and consider the auxiliary polynomial 


C(X) = X d - aoA^- 1 a d _2 a - a d _ L . 


In the exercise below 



is the appropriate polynomial in n. 


Exercise 8 . 5 . Consider the linear recurrence 


•G/. Q> 0 ^n—d “h Q'l^n— d— 1 " 4 " Q-d— l^n— 1 ( ) 

with a,j G F 2 and a 0 7^ 0 . 

(%) Suppose K is afield containing ¥2 such that the auxiliary polynomial 
C has a root a in K . Then a n is a solution of (*) in K . 

(ii) Suppose I\ is a field containing ¥2 such that the auxiliary polynomial 
C has d distinct roots ai, a 2 , . . . , a d in K . Then the general solution of (*) 
in K is 


d 

x n ^ ^ bjUj 
3 = 1 

for some bj G K . If x 0 , x\, . . . , x^-i G F 2 then x n G F 2 for all n. 

(Hi) Work out the first few lines of Pascal’s triangle modulo 2 . Show that 
the functions fj'.'L — 1 F 2 



are linearly independent in the sense that 

m 

J2 a ifi(n) = 0 

3—0 

for all n implies a,j = 0 for 1 < j < m. 

(iv) Suppose I\ is a field containing F 2 such that the auxiliary polynomial 
C factorises completely into linear factors. If the root a u has multiplicity m u 
[1 < n < q\ then the general solution of (*) in I\ is 



for some b u , v G K. If x 0 , xi, . . . , x d _ 1 G F 2 then x n G F 2 for all n. 
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An strong link with the problem of BCH decoding is provided by Theo- 
rem 8.7 below. 

Definition 8.6. If we have a sequence ( or stream) x 0 , X\, x 2 , . . . of elements 
of F 2 then its generating function G is given by 

OO 

G(z) = J2n z ’ 

n— 0 

Theorem 8.7. The stream (x n ) comes from a linear feedback generator with 
auxiliary polynomial C if and only if the generating function for the stream 
is (formally) of the form 


with B a polynomial of degree strictly than that of C . 

If we can recover G from G then we have recovered the linear feedback 
generator from the stream. 

The link with BCH codes is established by looking at Lemma 7.21 (iii) 
and making the following remark. 

Lemma 8.8. If a stream (x n ) comes from a linear feedback generator with 
auxiliary polynomial C of degree d then C is determined by the condition 

G(Z)C(Z ) = B(Z) mod Z 2d 

with B a polynomial of degree at most d — 1. 

We thus have the following problem. 

Problem Given a generating function G for a stream and knowing that 


with B a polynomial of degree less than that of C and the constant term in 
C is c 0 = 1, recover C . 

The Berlekamp-Massey method In this method we do not assume that the 
degree d of C is known. The Berlekamp-Massey solution to this problem is 
based on the observation that, since 

d 

c :i Xn :i = 0 

j=0 
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starting at A r if it is known that r > d. For each Aj we evaluate clot Aj . If 
det Aj ^ 0 then j ^ d. If det Aj = 0 then j is a good candidate for d so 
we solve ★ on the assumption that d = j. (Note that a one dimensional 
subspace of F d+1 contains only one non-zero vector.) We then check our 
candidate for (c 0 , c 1; . . . , q) over as many terms of the stream as we wish. If 
it fails the test we then know that d > j and we start again. 

As we have stated it, the Berlekamp-Massey method is not an algorithm 
in the strict sense of the term although it becomes one if we put an upper 
bound on the possible values of d. (A little thought shows that if no upper 
bound is put on d. no algorithm is possible because, with a suitable initial 
stream a linear feedback register with large d can be made to produce a 
stream whose initial values would be produced by a linear feedback register 
with much smaller d. For the same reason the Berlekamp-Massey will produce 
the B of smallest degree which gives G and not necessarily the original B.) 
In practice, however, the Berlekamp-Massey method is very effective in cases 
when d is unknown. 

It might be thought that evaluating determinants is hard but we can use 
row reduction to triangularise the matrices and use the fact that A/,. | is a 
sub-matrix of .4/, to reduce the work still further. 


A method based on Euclid’s algorithm (This is starred and will be omitted if 
time is short.) For this method we need to know the degree d of C. 


Writing G(Z) = Y7jLa x i Z3 we take M z ) = Yfj= o 


XjZ J so that 


BIZ) 


= A(Z) 


C(Z ) ' ' 

for some power series U. It follows that 

B(Z) = A(Z)C(Z) 


*U( Z ) 


*W{Z) 


36 



where A(Z) is known but B(Z ), C(Z) and the power series W(Z) are un- 
known. 

We now apply Euclid’s algorithm to Ro(Z) = Z 2d , R\{Z) = A(Z) obtain- 
ing, as usual, 


R 0 (Z) = R 1 (Z)Q 1 {Z) + R 2 {Z) 

Ri(Z) = R 2 (Z)Q 2 (Z) + R 3 (Z) 

R 2 (Z) = R 3 (Z)Q 3 (Z) + R 4 (Z) 

and so on, but instead of allowing the algorithm to run its full course we 
stop at the first point when the degree of Rj is no greater than d. Call 
the polynomial Rj so obtained B. By the method associated with Bezout’s 
theorem we can find polynomials C and W such that 

Rj(Z) = R 1 (Z)C(Z) + Rq(Z)W (Z) 


and so 


B(Z)A(Z)C(Z) + Z 2d W(Z). 


ft 


Lemma 8.9. With the notation above. 

(i) B and C both have degree d or less. 

td( y\ jDf y\ 

(ii) The power series expansions of -yyyyj and agree up to the term 


Z 2d . 


C(Z) C{Z ) 


ish. 


(Hi) The first 2 d terms of the power series expansions of van- 

J3 Cr 

(iv) The power series of is the generating sequence for a linear 

CC 


feedback system with auxiliary (or feedback) polynomial CC. 
(v) We have B = B and C = C. 


This method is called the Skorobogarov decoder. 


9 A short homily on cryptography 

Cryptography is the science of code making. Cryptanalysis is the art of code 
breaking. 

Two thousand years ago Lucretius wrote that ‘Only recently has the 
true nature of things been discovered’. In the same way mathematicians 
are apt to feel that ‘Only recently has the true nature of cryptography been 
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discovered’. The new mathematical science of cryptography with its promise 
of codes which are ‘provably hard to break’ seems to make everything that 
has gone before irrelevant. 

It should, however, be observed that the best cryptographic systems of our 
ancestors (such as diplomatic ‘book codes’) served their purpose of ensuring 
secrecy for a relatively small number of messages between a relatively small 
number of people extremely well. It is the modern requirement for secrecy 
on an industrial scale to cover endless streams of messages between many 
centres which has made necessary the modern science of cryptography. 

More pertinently it should be remembered that the German Submarine 
Enigma codes not only appeared to be ‘provably hard to break’ (though not 
against the modern criteria of what this should mean) but, considered in iso- 
lation probably were unbreakable in practice 11 . Fortunately the Submarine 
codes formed part of an ‘Enigma system’ with certain exploitable weaknesses. 
(For an account of how these weaknesses arose and how they were exploited 
see Kahn’s Seizing the Enigma [3].) 

Even the best codes are like the lock on a safe. However good the lock 
is, the safe may be broken open by brute force, or stolen together with its 
comments, or a key holder may be persuaded by fraud or force to open the 
lock, or the presumed contents of the safe may have been tampered with 
before they go into the safe, or . . . . The coding schemes we shall consider, 
are at best, cryptographic elements of larger possible cryptographic systems. 
The planning of cryptographic systems requires not only mathematics but 
engineering, economics, psychology, humility and an ability to learn from 
past mistakes. Those who do not learn the lessons of history are condemned 
to repeat them. 

In considering a cryptographic system is important to consider its pur- 
pose. Consider a message M sent by A to B. Possible aims include 
Secrecy A and B can be sure that no third party X can read the message 

M. 

Integrity A and B can be sure that no third party X can alter the message 

M. 

Authenticity B can be sure that A sent the message M. 
Non-repudiation B can prove to a third party that A sent the message M. 

When you fill out a cheque giving the sum both in numbers and words you 
are seeking to protect the integrity of the cheque. When you sign a traveller’s 
cheque ‘in the presence of the paying officer’ the process is intended, from 
your point of view to protect authenticity and, from the bank’s point of view 
to produce non-repudiation. 

11 Some versions remained unbroken until the end of the war. 
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Another point to consider is the level of security aimed at. It hardly 
matters if a few people use forged tickets to travel on the underground, it 
does matter if a single unauthorised individual can gain privileged access to 
a bank’s central computer system. If secrecy is aimed at, how long must the 
secret be kept? Some military and financial secrets need only remain secret 
for a few hours, others must remain secret for years. 

We must also, to conclude this non- exhaustive list, consider the level of 
security required. Here are three possible levels. 

(1) Prospective opponents should find it hard to compromise your system 
even if they are in possession of a plentiful supply of encoded messages C % . 

(2) Prospective opponents should find it hard to compromise your system 
even if they are in possession of a plentiful supply of pairs (M*, Ci) of messages 
Mi together with their encodings C r . 

(3) Prospective opponents should find it hard to compromise your system 
even if they are allowed to produce messages M r and given their encodings 
O. 

Clearly safety at level (3) implies safety at level (2) and safety at level (2) 
implies safety at level (1). Roughly speaking, the best Enigma codes sat- 
isfied (1). The German Navy believed on good but mistaken grounds that 
they satisfied (2). Level (3) would have appeared evidently impossible to 
attain until a few years ago. Nowadays, level (3) is considered a minimal 
requirement for a really secure system. 


10 Stream cyphers 

One natural way of enciphering is to use a stream cypher. We work with 
streams (that is, sequences) of elements of F 2 . We use cypher stream k 0 , Aq, 
A' 2 .... The plain text stream p 0 , p\ . p%, . . . is enciphered as the cypher text 
stream zq, z\, Z 2 , ... given by 

Pn + k n - 

This is an example of a private key or symmetric system. The security of 
the system depends on a secret (in our case the cypher stream) k shared be- 
tween the cypherer and the encipherer. Knowledge of an enciphering method 
makes it easy to work out a deciphering method and vice versa. In our case 
a deciphering method is given by the observation that 

Pn — Zn T k n . 

(Indeed, writing a;(p) = p + z we see that the enciphering function a has 
the property that a 2 = i the identity map. Cyphers like this are called 
symmetric. ) 
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In the one-time pad first discussed by Vernam in 1926 the cypher stream is 
a random sequence kj = K :t where the K t are independent random variables 
with 


Pr {Kj = 0) = Pr {Kj = 1) = 1/2. 

If we write Zj = pj + Kj then we see that the Pj are independent random 
variables with 


p r (Pj = 0) = Pi :{Pj = 1) = 1/2. 

Thus (in the absence of any knowledge of the ciphering stream) the code- 
breaker is just faced by a stream of perfectly random binary digits. Deci- 
pherment is impossible in principle. 

It is sometimes said that it is hard to find random sequences, and it is 
indeed rather harder than might appear at first sight, but it is not too difficult 
to rig up a system for producing ‘sufficiently random’ sequences 12 . The secret 
services of the former Soviet Union were particularly fond of one-time pads. 
The real difficulty lies in the necessity for sharing the secret sequence k. If 
a random sequence is reused it ceases to be random (it becomes ‘the same 
code as last Wednesday’ or the ‘the same code as Paris uses’) so, when there 
is a great deal of code traffic, new one-time pads must be sent out. If random 
bits can be safely communicated so can ordinary messages and the exercise 
becomes pointless. 

In practice we would like to start from a short shared secret ‘seed’ and 
generate a ciphering string k that ‘behaves like a random sequence’. This 
leads us straight into deep philosophical waters 13 . As might be expected 
there is an illuminating discussion in Chapter III of Ivnuth’s marvellous The 
Art of Computing Programming [6]. Note in particular his warning: 

. . . random numbers should not be generated with a method cho- 
sen at random. Some theory should be used. 

One way that we might try to generate our ciphering string is to use a gen- 
eral feedback shift register / of length d with the initial fill (A: 0 , Aq, . . . , A^-i) 
as the secret seed. 

12 Take ten of your favourite long books, convert to binary sequences x j, n and set k n = 
moo : j s n + s n where s n is the output of your favourite ‘p seu do-random number 
generator’. Give a disc with a copy of k to your friend and, provided both of you obey 
some elementary rules, your correspondence will be safe from MI5. The anguished debate 
in the US about codes and privacy refers to the privacy of large organisations and their 
clients, not the privacy of communication from individual to individual. 

13 Where we drown at once, since, the best (at least my opinion) modern view is that 
any sequence that can be generated by a program of reasonable length from a ‘seed’ of 
reasonable size is automatically non-random. 
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Lemma 10.1. If f is a general feedback shift register of length d then given 
any initial fill (k 0 , Aq 5 . . . , k^-i) there will exist N,M < 2 d such that the 
output stream k satisfi.es k r+ N = k r for all r > M. 

Lemma 10.2. Suppose that f is a linear feedback register of length d. 

(i) f(x 0 ,mi , . . . ,x d -i) = (x 0 , , x d _i) if and only if (x 0 , x t , . .*,- : , x d -i) = 

(0,0,. ..,0). 

(ii) Given any initial fill (A'o, k ±, . . . , kd-i) there will exist N, M < 2 d — 1 
such that the output stream k satisfies k r+N = k r for all r > M. 

We can complement Lem m a, 10.2 by using Lemma 6.15 and the associated 
the discussion. 

Lemma 10.3. A linear feedback register of length d attains its maximal pe- 
riod 2 d — 1 (for a non-trivial initial fill) when the roots of the feedback poly- 
nomial are primitive elements of F 2 ' . 

(We will note why this result is plausible but we will not prove it.) 

It is well known that short period streams are dangerous. During World 
War II the British Navy used codes whose period was adequately long for 
peace time use. The massive increase in traffic required by war time con- 
ditions meant that the period w T as now too short. By dint of immense toil 
German naval code breakers were able to identify coincidences and by this 
means slowly break the British codes. 

Unfortunately, whilst short periods are definitely unsafe it does not follow 
that long periods guarantee safety. Using the Berlekamp-Massey method we 
see that stream codes based on linear feedback registers are unsafe at level 
(2). 

Lemma 10.4. Suppose that an unknown cypher stream k 0i Aq, A: 2 ... is 
produced by an unknown linear feedback register f of unknown length d < D. 
The plain text stream po, p\ , p 2 . . . . is enciphered as the cypher text stream 
-o, D, - 2 , • • • given by 


z n = Pn 


If we are given po, p\, ... p^D - 1 and %, Zi, . . . Z 2 D -1 then we can find k r 
for all r. 

Thus if we have a message of length twice the length of the linear feedback 
register together with its encipherment the code is broken. 

It is easy to construct immensely complicated looking linear feedback 
registers with hundreds of registers. Lemma 10.4 shows that, from the point 
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of view of a determined, well equipped and technically competent opponent, 
cryptographic systems based on such registers are the equivalent of leaving 
your house key hidden under the door mat. Professionals say that such 
systems seek ‘security by obscurity’. 

However, if you do not wish to baffle the CIA, but merely prevent little 
old ladies in tennis shoes watching subscription television without paying for 
it, systems based on linear feedback registers are cheap and quite effective. 
Whatever they may say in public, large companies are happy to tolerate a 
certain level of fraud. So long as 99.9% of the calls made are paid for, the 
profits of a telephone company are essentially unaffected by the .1% which 
‘break the system’. 

What happens if we try some simple tricks to increase the complexity of 
the cypher text stream. 

Lemma 10.5. If x n is a stream produced by a linear feedback system of 
length N with auxiliary polynomial P and y n is a stream produced by a linear 
feedback sy stern of length N with auxiliary polynomial Q then x n + y n is a 
stream produced by a linear feedback system of length N + M with auxiliary 
polynomial P{X)Q{X). 

Note that this means that adding streams from two linear feedback system 
is no more economical than producing the same effect with one. Indeed the 
situation may be worse since a stream produced by linear feedback system of 
given length may, possibly, also be produced by another linear feedback system 
of shorter length. 

Lemma 10.6. Suppose that is a stream produced by a linear feedback 
system of length N with auxiliary polynomial P and y n is a stream produced 
by a linear feedback system of length N with auxiliary polynomial Q. Let P 
have roots aq, 02, ■ ■ ■ <^n and Q have roots /3±, p2, ■ ■ ■ Pm over some field 
K D F 2 . Then x n y n is a stream produced by a linear feedback system of length 
NM with auxiliary polynomial 

[[ 1 ] (x-aPj). 

1 <i<N 1 <i<M 

We shall probably only prove Lemmas 10.5 and 10.6 in the case when all 
roots are distinct, leaving the more general case as an easy exercise. We 
shall also not prove that the polynomial IIkkm^ — a iPj) obtained 

in Lemma 10.6 actually lies in F 2 but (for those who are familiar with the 
phrase in quotes) this is an easy exercise in ‘symmetric functions of roots’. 

Here is an even easier remark. 
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Lemma 10.7. Suppose that x n is a stream which is periodic with period N 
and y n is a stream which is periodic with period M . Then the streams x n + y n 
and x n y n are periodic with periods dividing the lowest common multiple of N 
and M. 

Exercise 10.8. One of the most confidential German codes (called FISH by 
the British) involved a complex mechanism which the British found could be 
simulated by two loops of paper tape of length 1501 and 1497. If k n = x n + y n 
where x n is a stream of period 1501 and y n is stream of period 1497 what is 
the longest possible period of k n . How many consecutive values of k n do you 
need to to specify the sequence completely. 

It might be thought that the lengthening of the underlying linear feed- 
back system obtained in Lemma 10.6 is worth having but it is bought at a 
substantial price. Let me illustrate this by an informal argument. Suppose 
we have 10 streams x,j^ n (without any peculiar properties) produced linear 
feedback registers of length about 100. If we form k n = n™! n n then the 
Berlekamp-Massey method requires of the order of 10 20 consecutive values of 
k n and the periodicity of k n can be made still more astronomical. Our cypher 
key stream k n appears safe from prying eyes. However it is doubtful if the 
prying eyes will mind. Observe that (under reasonable conditions) about 2 _1 
of the Xj^ n will have the value 1 and about 2~ 10 of the k n = Ilj=i x j,n will 
have value 1. Thus if z n = p n + k n , in more than 999 cases out of a 1000 we 
will have z n = p n . Even if we just combine two streams x n and y„ in the way 
suggested we may expect x n y n = 0 for about 75% of the time. 

Here is another example where the apparent complexity of the cypher key 
stream is substantially greater than its true complexity. 

Example 10.9. The following is a simplified version of a standard satel- 
lite TV decoder. We have 3 streams x n , y n , z n produced by linear feedback 
registers. If the cypher key stream is defined by 

kn -C// if /-/; 0 

k n — yn if — 1 


then 


k"n xUn “I - %n)Zn “I” %n 

and the cypher key stream is that produced by linear feedback register. 

It might be thought that the best way round these difficulties is to use a 
non-linear feedback generator /. This is not the easy way out that it appears. 


43 



If chosen by an amateur the complicated looking / so produced will have the 
apparent advantage that we do not know what is wrong with it and the very 
real disadvantage that we do not know what is wrong with it. 

Another approach is to observe that, so far as the potential code breaker 
is concerned, the cypher stream method only combines the ‘unknown secret’ 
(here the feedback generator / together with the seed (A: 0 , Ay, . . . , A' d _i)) with 
the unknown message p in a rather simple way. It might be better to consider 
a system with two functions F : E,' 1 x F 2 — > and G : F 2 n x > F 2 . such 
that 


G(k,F(k,p)) = p. 

Here k will be the shared secret, p the message z = F(k, p) the encoded 
message we can be decoded by using the fact that G(k, z) = p. 

In the next section we shall see that an even better arrangement is pos- 
sible. However, arrangements like this have the disadvantage that the the 
message p must be entirely known before it is transmitted and the encoded 
message z must have been entirely received before in can be decoded. Stream 
ciphers have the advantage that they can be decoded ‘on the fly’. They are 
also much more error tolerant. A mistake in the coding, transmission or 
decoding of a single element only produces an error in a single place of the 
sequence. There will continue to be circumstances where stream ciphers are 
appropriate. 

There is one further remark to be made. Suppose that, as is often the case, 
that we know F, that n = q and we know the ‘encoded message’ z. Suppose 
also that we know that the ‘unknown secret’ or ‘key’ k e 1C C P 2 n and the 
‘unknown message’ p G V C PJ. We are then faced with the problem:- Solve 
the system 


z = F(k, p) where k G A, p G P. ★ 

Speaking roughly, the task is hopeless unless ★ has a unique solution 14 . 
Speaking even more roughly, this is unlikely to happen if |/C||'P| > 2 n and is 
likely to happen if 2 n is substantially greater than |A’||'P|. (Here, as usual, 
\B\ denotes the number of elements of B.) 

14 ‘According to some, the primordial Torah was inscribed in black flames on white fire. 
At the moment of its creation, it appeared as a series of letters not yet joined up in 
the form of words. For this reason, in the Torah rolls there appear neither vowels nor 
punctuation, nor accents; for the original Torah was nothing but a disordered heap of 
letters. Furthermore, had it not been for Adam’s sin, these letters might have been joined 
differently to form another story. For the kabalist, God will abolish the present ordering 
of the letters, or else will teach us how to read them according to a new disposition only 
after the coming of the Messiah.’ ([1], Chapter 2.) 
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Now recall the definition of the information rate given in De fi nition 1.2. 
If the message set A4 has information rate // and the key set (that is the 
shared secret set) A’ has information rate n then, taking logarithms we see 
that if 


n — m/c — np 

is substantially greater than 0 then ★ is likely to have a unique solution, but 
if it is substantially smaller this is unlikely. 

Example 10.10. If instead of using binary code we consider an alphabet of 
21 letters ( the English alphabet plus a space ) we must take logarithms to the 
base 21 but the considerations above continue to apply. The English language 
treated in this way has information rate about .f. (This is very much a 
ball park figure. The information rate is certainly less than .5 and almost 
certainly greater than .2.) 

(i) In the Caesar code we replace the ith element of our alphabet by the 
i + jth (modulo 21). The shared secret is a single letter (the code for A say). 
We have m = 1, n = 1 and p Ri .4. 

n — mn — np r; .6 n — 1. 

If n = 1 (so n — m/c — np r — A) it is obviously impossible to decode the 
message. If n = 10 (so n — mn — np R 5 ) a simple search through the 21 
possibilities will almost always give a single possible decode. 

(H) A simple substitution code a permutation of the alphabet is chosen 
and applied to each letter of the code in turn. The shared secret is a sequence 
of 26 letters (giving the coding of the first 26 letters, the 21th can then be 
deduced). We have m = 26, k = 1 and p r A. 

n — mn — np r .6 n — 26. 

In the Dancing Men Sherlock Holmes solves such a code with n = 68 (so 
ti rriKj Tifj lb) without straining the reader's credulity too much and 
would think that, unless the message is very carefully chosen most of my 
audience could solve such a code with n = 200 (so n — ran — np r: lOOj. 

(in) In the one-time pad m — n and k = 1 so (if p > 0) 

n — m/c — np = — np. — > — oo 


as n — *■ oo. 

(iv) Note that the larger p is the slower n — mn — np increases. This 
corresponds to the very general statement that the higher the information 
rate of the messages the harder it is to break the code in which they are sent. 
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The ideas just introduced can be formalised by the notion of unicity 
distance. 

Definition 10.11. The unicity distance of a code is the number of bits of 
message required to exceed the number of bits of information in the key plus 
the number of bits of information in the message. 

If the reader complains that there is a faint smell of red herring about this 
definition, I would be inclined to agree. Without a clearer discussion of 
‘information content’ than is given in this course it must remain more of a 
slogan than a defi n ition. 

If we only use our code once to send a message which is substantially 
shorter than the unicity distance we can be confident that no code breaker, 
however gifted, could break it, simply because there is there is no unambigu- 
ous decode. (A one-time pad has unicity distance infinity.) However, the 
fact that there is a unique solution to a problem does not mean that it is 
easy to find. We have excellent reasons, some of which are spelled out in the 
next section, to believe that there exist codes for which the unicity distance 
is essentially irrelevant to the maximum safe length of a message. 


11 Asymmetric systems 

Towards the end of the previous section we discussed a general coding scheme 
depending on a shared secret key k known to the encoder and the decoder. 
However, the scheme can be generalised still further by splitting the secret 
in two. Consider a system with two functions F : F"' x Fj' -h- Fj and 
G : . such that 


G(l,F(k,p)) = p. 

Here (k, 1) will be be a pair of secrets, p the message z = F(k, p) the encoded 
message which can be decoded by using the fact that G( 1, z) = p. In this 
scheme the encoder must know k but need not know 1 and the decoder must 
know 1 and but need not know k. Such a system is called assymetric. 

So far the idea is interesting but not exciting. Suppose however, that we 
can show that 

(i) knowing F , G and 1 it is very hard to find k, 

(ii) if we do not know k then, even if we know F and G , it very hard to 
find p F(k, p). 

Then the code is secure at what we called level (3). 
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Lemma 11.1. Suppose that the conditions specified above hold. Then an 
opponent who is entitled to demand the encodings z * of any messages p,; they 
choose to specify will still find it very hard to fi.nd p when given F( k, p). 

Let us write F(k,p) = p A>1 and G( 1, z) = z A ^ and think of p A - 4 as 
participant M’s encipherment of p and z k a as participant B' s decipherment 
of z. We then have 

(p/C^A '- 1 = p 

Lemma 11.1 tells us that such a system is secure however many messages 
are sent. Moreover, if we think of A a a spy-master he can broadcast Ka 
to the world (that is why such systems are called public key systems) and 
invite anybody who wants to spy for him to send him secret messages in total 
confidence. 

It is all very well to describe such a code but do they exist? There is 
very strong evidence that they do but so far all mathematicians have been 
able to do is to show that provided certain mathematical problems which are 
believed to be hard are indeed hard then good codes exist. 

The following problem is believed to be hard. 

Problem Given an integer N which is known to be the product N = pq of 
two primes p and q, find p and q. 

Several schemes have been proposed based on assumption that this factori- 
sation is hard. (Note however that it is easy to find large primes p and q.) 
We give a very elegant scheme due to Rabin and Williams. It makes use of 
some simple number theoretic results from 1A and IB. 

The following result was proved towards the end of the course Quadratic 
Mathematics and is, in any case, easy to obtain by considering primitive 
roots. 

Lemma 11.2. If p is an odd prune the congruence 

x 2 = d mod p 

is soluble if and only if d = 0 or d h * -1 )/ 2 = \ modulo p. 

Lemma 11.3. Suppose p is a prune such that p = 4A: — 1 for some integer 
k. Then if the congruence 

x 2 = d mod p 

has any solution, it has d k as a solution. 

We now call on the Chinese remainder theorem. 
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Lemma 11.4. Let p and q be prunes of the form 4A: — 1 and set N = pq. 
Then the following two problems are of equivalent difficulty. 

(A) Given N and d find all the m satisfying 

m 2 = d mod N. 

(B) Given N find p and q. 

(Note that, provided that that d ^ 0, knowing the solution to (A) for any 
d gives us the four solutions for d = 1.) The result is also true but much 
harder to prove for general primes p and q. 

At the risk of giving aid and comfort to followers of the Lakatosian heresy 
it must be admitted that the statement of Lemma 11.4 does not really tell 
us what the result we are proving is, although the proof makes it clear that 
the result (whatever it may be) is certainly true. However, with more work, 
everything can be made precise. 

We can now give the Rabin- Williams scheme. The spy-master A selects 
two very large primes p and q. (Since he has only done an undergraduate 
course in mathematics he will take p and q of the form 4 k — 1.) He keeps the 
pair (p. q) secret but broadcasts the public key N = pq. If B wants to send 
him a message she writes it in binary code splits it into blocks of length m 
with 2 m < N < 2 m+1 . Each of these blocks is a number r.j with 0 < r, < N. 
B computes Sj such that r? = Sj modulo N and sends sj. The spy-master 
(w T ho knows p and q) can use the method of Lemma 11.4 to find one of four 
possible values for r,j (the four square roots of sf). Of these four possible 
message blocks it is almost certain that three will be garbage so the fourth 
will be the desired message. 

If the reader reflects, she will see that the ambiguity of the root is gen- 
uinely unproblematic. (If the decoding is mechanical then making each block 
start with some fixed sequence of length 50 will reduce the risk of ambigu- 
ity to negligible proportions.) Slightly more problematic, from the practical 
point of view, is the possibility that some one could be known to have sent a 
very short message, that is to have started with an m such that 1 < m < iV 1 / 2 
but provided sensible precautions are taken this should not occur. 

12 Commutative public key systems 

In the previous sections we introduced the coding and decoding functions 
K 4 and K A 1 with the property that 

(p^)WT 1 = p , 
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and satisfying the condition that knowledge of K \ did not help very much in 
finding K A 1 . We usually require, in addition, that our system be commutative 
in the sense that 



and that knowledge of K A 1 does not help very much in finding K \ . The 
Rabin-Williams scheme, as described in the last section, does not have this 
property. 

Commutative public key codes are very flexible and provide us with simple 
means for maintaining integrity, authenticity and non-repudiation. (This is 
not to say that non-commutative codes can not do the same; simply that 
commutativity makes many things easier.) 

Integrity and non-repudiation Let A ‘own a code’, that is know both 
Ka and K A 1 . Then .4 can broadcast K A 1 to everybody so that everybody 
can decode but only A can encode. (We say that K A 1 is the public key and 
K a the private key.) Then, for example, example, A could issue tickets to 
the castle ball carrying the coded message ‘admit Joe Bloggs’ which could be 
read by the recipients and the guards but would be unforgeable. However, 
for the same reason, A could not deny that he had issued the invitation. 

Authenticity If B wants to be sure that A is sending a message then B can 
send A a harmless random message q. If B receives back a message p such 
that p K ' ends with the message q then A must have sent it to B. (Any 
body can copy a coded message but only A can control the content.) 

Signature Suppose now that B owns a commutative code pair ( K B . K n 1 ) 
and has broadcast K n 1 . If A wants to send a message p to B he computes 
q = p Ajl and sends p r b followed by (q A -i ) k b . B can now use the fact 
that 



to recover p and q. B then observes that q A ^ = p. Since only A can 
produce a pair (p, q) with this property, A must have written it. 

There is now a charming little branch of the mathematical literature 
based on these ideas in which Albert gets Bertha to authenticate a mes- 
sage from Caroline to David using information from Eveline and Fitzpatrick, 
Gilbert and Harriet play coin tossing down the phone and Ingred, Jacob, 
Katherine and Laszlo play bridge without using a pack of cards. However a 
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cryptographic system is only as strong as its weakest link. Unbreakable pass- 
word systems do not prevent computer systems being regularly penetrated 
by ‘hackers’ and however ‘secure’ a transaction on the net may be it may 
still involve a rogue on one end and a fool on the other. 

The most famous candidate for a commutative public key system is the 
RSA (Rivest, Shamir, Adleman) system. It was the RSA system the first 
convinced the mathematical community that public key systems might be 
feasible. The reader will have met the RSA in 1A but we will push the ideas 
a little bit further. 

Lemma 12.1. Letp andq be primes. If N = pq and \(N) = lcm(p— l,q— 1) 
then 

m a ( w) = x ( mod tv) 

for all integers M. 

Since we wish to appeal to Lemma 11.4 w T e shall assume in what follows 
that we have secretly chosen large primes p and q of the form 4A: — 1. (How- 
ever, as before, the arguments can be made to work for general large primes 
p and q.) We choose an integer e and then use Euclid’s algorithm to find an 
integer d such that 

de = 1. (mod A (N)) 

Since others may be better psychologists than we are, we would be wise to 
use some sort of random method for choosing p, q and e. 

The public key includes the value of d and N but we keep secret the value 
of e. Given a number M with 1 < M < N — 1 we encode it as the integer E 
with 1 < M < N — 1 

E = M d (mod N). 

The public decoding method is given by the observation that 

E e = M de = M 

As was observed in 1A, high powers are easy to compute. 

To show that (providing that factoring N is indeed hard) finding d from 
e and N is hard we use the following lemma. 

Lemma 12.2. Suppose that d, e and N are as above. Set de — 1 = 2 a b where 
b is odd. 

(i) a > 1. 

(ii) Ify = x b (mod N) then there exists an r with 1 < r < 2“~ 1 such that 

y r ^ 1 but y r = 1 (mod N). 
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Combined with Lemma 11.4, the idea of Lemma 12.2 gives a fast prob- 
abilistic algorithm where by making random choices of x we very rapidly 
reduce the probability that we can not find p and q to as close to zero as we 
wish. 

Lemma 12.3. The problem of finding d from the public information e and 
N is, essentially as har'd as factorising N. 

Remark 1 At first glance we seem to have done as well for the RSA code 
as for the Rabin-Williams code. But this is not so. In Lemma 11.4 we 
showed that finding the four solutions of M 2 = E (mod N) was equivalent 
to factorising N. In the absence of further information, finding one root is 
as hard as finding another. Thus the ability to break the Rabin-Williams 
code (without some tremendous stroke of luck) is equivalent to the ability to 
factor N . On the other hand it is a priori, possible that it might be possible 
to find a decoding method for the RSA code which did not involve knowing 
d. Thus it might be possible to break the RSA code without finding d. It 
must, however, be said that, in spite of this problem, the RSA code is much 
used in practice and the Rabin-Williams code is not. 

Remark 2 It is natural to ask what evidence there is that the factori- 
sation problem really is hard. Properly organised, trial division requires 
0(N 1//2 ) operations to factorise a number N. This order of magnitude was 
not bettered until 1972 when Lehman produced a 0(N 1//3 ) method. In 1974, 
Pollard 15 produced a 0(W 1//4 ) method. In 1979, as interest in the problem 
grew because of its connection with secret codes, Lenstra made a break- 
through to a ()(< r u |0 !'. v)) ; j method with c & 2. Since then some 

progress has been made (Pollard reached 0(e 2 ^ logN ^ loglogN ^ 1/3 ) but in spite 
of intense efforts mathematicians have not produced anything which would 
be a real threat to codes based on the factorisation problem. In 1996, it was 
possible to factor 100 (decimal) digit numbers routinely, 150 digit numbers 
with immense effort but 200 digit numbers were out of reach. 

Organisations which use the RSA and related systems rely on ‘security 
through publicity’. Because the problem of cracking RSA codes is so notori- 
ous any breakthrough is likely to be publically announced 16 . Moreover, even 
if a breakthrough occurs it is unlikely to be one which can be easily exploited 
by the average criminal. So long as the secrets covered by RSA-type codes 
need only be kept for a few T months rather than forever, the codes can be 
considered to be one of the strongest links in the security chain. 

■Although mathematically trained, Pollard worked outside the professional mathemat- 
ical community. 

16 And if not, is most likely to be a government rather than a Mafia secret. 


51 



13 Trapdoors and signatures 

It might be thought that secure codes are all that are needed to ensure the 
security of com m unications but this is not so. It is not necessary to read 
a message to derive information from it 1 '. In the same way, it may not be 
necessary to be able to write a message in order to tamper with it. 

Here is a somewhat far fetched but worrying example. Suppose that by 
wire tapping or by looking over peoples’ shoulders I find that a bank creates 
messages in the form Mi, M 2 where M\ is the name of the client and M 2 is the 
sum to be transfered to the client’s account. The messages are then encoded 
according to the RSA scheme discussed after Lemma 12.1 as Z x = Mf and 
Z 2 = Mf. I then enter into a transaction with the bank which adds $ 1000 to 
my account. I observe the resulting Z\ and Z 2 and the transmit Z\ followed 
by Zl 

Example 13.1. What will (I hope) be the result of this transaction. 

We say that the RSA scheme is vulnerable to ‘homomorphism attack’. 

One way of increasing security against tampering is to first code our 
message by classical coding method and then use our RSA (or similar) scheme 
on the result. 

Exercise 13.2. Discuss briefly the effect of first using an RSA scheme and 
then a classical code. 

However there is another way forward which has the advantage of wider 
applicability since it also can be used to protect the integrity of open (non- 
coded) messages and to produce password systems. These are the so called 
signature systems. (Note that we shall be concerned with the ‘signature of 
the message’ and not the signature of the sender.) 

Definition 13.3. A signature or trapdoor or hashing function is a rnapping 
H : j\4 — > 5 from the space At of possible messages to the space S of possible 
signatures. 

(Let me admit at once that Definition 13.3 is more of a statement of notation 
than a useful definition.) The first requirement of a good signature function 
is that the space A4 should be much larger than the space S so that H is a 
many-to-one function (in fact a great-many-to-one function) so that we can 
not work back from H(M ) to M. The second requirement is that 5 should 
be large so that a forger can not (sensibly) hope to hit on H(M ) by luck. 

17 During World War II, British bomber crews used to spend the morning before a night 
raid testing their equipment, this included the radios. 
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Obviously we should aim at the same kind of security as that offered by 
our ‘level 2’ for codes:- 

Prospective opponents should find it hard to find H(M ) given M 
if they are in possession of a plentiful supply of message, signature 
pairs of messages .1 /, together with their encodings 

a 

I leave it to the reader to think about level 3 security (or to look at section 
12.6 of [9]). 

Here is a signature scheme due to Elgamal. The message sender A chooses 
a very large prime p. some integer 1 < g < p. and some other integer u 
with 1 < u < p (as usual, some randomisation scheme should be used). A 
then releases the values of p, g and g" (modulo p) but keeps the value of u 
secret. Whenever he sends a message m (some positive integer) he chooses 
another integer k with 1 < k < p — 2 at random and computes r and s with 
1 < t < p — 1 and 0 < s < p — 2 by the rules 18 

r = g k (mod p) (*) 

m = ur + ks (mod p — 1) (**) 

Lemma 13.4. If conditions (*) and (**) are satisfied then 

gm = y r r s (mod p) 

If A sends the message m followed by the signature (r, s ) the recipient need 
only verify the relation g m = y r r s (mod p) to check that the message is 
authentic. 

Since k is random it is believed that the only way to forge signatures is 
to find u from g" and it is believed that this problem, which is known as the 
discrete logarithm problem is very hard. 

Needless to say, even if it is impossible to tamper with a message, sig- 
nature pair it is always possible to copy one. Every message should thus 
contain a unique identifier such as a time stamp. 

The evidence that the discrete logarithm problem is very hard is of the 
same kind of nature and strength as the evidence that the factorisation prob- 
lem is very hard. We conclude our discussion with a description of the Difhe- 
H e l m an key exchange system which is also based on the discrete logarithm 
problem. 

18 There is a small point which I have glossed over here and elsewhere. Unless k and 
and p — 1 are coprime the equation (**) may not be soluble. However the quickest way to 
solve (**) if it is soluble is Euclid’s algorithm which will also reveal if (**) is insoluble. If 
(**) is insoluble we simply choose another k at random and try again. 
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The modern coding schemes which we have discussed have the disadvan- 
tage that they require lots of computation. This is not a disadvantage when 
we deal slowly with a few important messages. For the Web where we must 
deal speedily with a lot of less than world shattering messages sent by im- 
patient individuals this is a grave disadvantage. Classical coding schemes 
are fast but become insecure with reuse. Key exchange schemes use modern 
codes to communicate a new secret key for each message. Once the secret 
key has been sent slowly, a fast classical method based on the secret key is 
used to encode and decode the message. Since a different secret key is used 
each time, the classical code is secure. 

How is this done? Suppose A and B are at opposite ends of a tapped 
telephone line. A sends B a (randomly chosen) large prime p and a randomly 
chosen g with 1 < g < p — 1. Since the telephone line is insecure A and B 
must assume that p and g are public knowledge. .4 now chooses randomly a 
secret number a and tells B the value of g a . B chooses randomly a secret 
number / 3 and tells A the value of g@. Since 

g af} = (g a f = (/)“, 

both A and B can compute k = g a 33 modulo p and k becomes the shared 
secret key. 

The eavesdropper is left with the problem of finding k = g ai3 from knowl- 
edge of g , g a and (modulo p). It is conjectured that this is essentially as 
hard as finding a and fj from the values of g , g a and g 13 (modulo p) and this 
is the discrete logarithm problem. 

We conclude with a quotation from Galbraith (referring to his time as 
ambassador to India) taken from Koblitz’s entertaining text [5]. 

I had asked that a cable from Washington to New Delhi ... be 
reported to me through the Toronto consulate. It arrived in code; 
no facilities existed for decoding. They brought it to me at the 
airport — a mass of numbers. I asked if they assumed I could 
read it. They said no. I asked how they managed. They said 
that when something arrived in code, they phoned Washington 
and had the original read to them. 


14 Further reading 

For many students this will be the last university mathematics course they 
will take. Although the twin subjects of error-correcting codes and cryptog- 
raphy occupy a small place in the grand panorama of modern mathematics, 
it seems to me that they form a very suitable topic for such a final course. 
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Outsiders often think of mathematicians as guardians of abstruse but 
settled knowledge. Even those who understand that there are still problems 
unsettled ask wdiat mathematicians will do wdien they run out of problems. 
At a more subtle level Kline’s magnificent Mathematical Thought from An- 
cient to Modern Times [4] is pervaded by the melancholy thought that though 
the problems will not run out they may become more and more baroque and 
inbred. ‘You are not the mathematicians your parents were’ whispers Kline 
‘and your problems are not the problems your parent’s were.’ 

However, when we look at this course we see that the idea of error- 
correcting codes did not exist before 1940. The best designs of such codes 
depend on the kind of ‘abstract algebra’ that historians like Kline and Bell 
consider a dead end but lie behind the superior performance of CD players 
and similar artifacts. 

In order to go further into both codes, whether secret or error correcting 
w T e need to go into the the question of how the information content of a 
message is to be measured. ‘Information theory’ has its roots in the code 
breaking of World War II (though technological needs would doubtless have 
lead to the same ideas shortly thereafter anyway). Its development required a 
level of sophistication in treating probability which was simply not available 
in the 19th century. (Even the Markov chain is essentially 20th century.) 

The question of what makes a calculation difficult could not even have 
been thought about until Godel’s theorem (itself a product of the great ‘foun- 
dations crisis’ at the beginning of the 20th century). Developments by Turing 
and Church of Godel’s theorem gave us a theory of computational complex- 
ity which is still under development today. The question of whether there 
exist ‘provably hard’ public codes is intertwined with still unanswered ques- 
tions in complexity theory. There are links with the profound (and very 20th 
century) question of what constitutes a random number. 

Finally the invention of the electronic computer has produced a cultural 
change in the attitude of mathematicians towards algorithms. Before 1950, 
the construction of algorithms was a minor interest of a few mathematicians. 
(Gauss and Jacobi were consider unusual in the amount of thought they 
gave to actual computation.) Today w T e would consider a mathematician as 
much as a maker of algorithms as a prover of theorems. The notion of the 
probabilistic algorithm which hovered over much of our discussion of secret 
codes is a typical invention of the last decades of the 20th century. 

Although both subjects are now ‘mature’ in the sense that they provide 
usable and well tested tools for practical application they still contain deep 
unanswered questions. For example 

How close to the Shannon bound can a ‘computationally easy’ error cor- 
recting code get? 
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Do provably hard public codes exist? 

Even if these questions are too hard there must surely exist error correct- 
ing and public codes based on new ideas. Such ideas would be most welcome 
and, although they are most likely to come from the professionals they might 
come from outside the usual charmed circles. 

The best book I know for further reading is Welsh [9]. After this the book 
of Goldie and Pinch [7] provides a deeper idea of the meaning of information 
and its connection with the topic. The book by Ivoblitz [5] develops the 
number theoretic background. The economic and practical importance of 
transmitting, storing and processing data far outweighs the importance of 
hiding it. However, hiding data is more romantic. For budding cryptologists 
and cryptographers (as well as those who want a good read) Kahn’s The 
Codebreakers [2] has the same role as is taken by Bell’s Men of Mathematics. 
for budding mathematicians. 
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15 First Sheet of Exercises 


Because this is a third term course I have tried to keep the questions simple. 
On the whole Examples will have been looked at in the lectures and Exercises 
will not but the distinction is not very clear. 


Q 15.1. Do Exercise 1.1. In the model of a communication channel we take 
the probability p of error to be less than 1/2. Why do we not consider the 
case 1 > p > 1/2? What if p = 1/2? 

Q 15.2. Do Exercise 2.3 Machines tend to communicate in binary strings 
so this course concentrates on binary alphabets with two symbols. There is 
no particular difficulty in extending our ideas to alphabets with n symbols 
though, of course, some tricks will only work for particular values of n. If 
you look at the inner title page of almost any recent book you will find its 
International Standard Book Number (ISBN). The ISBN uses single digits 
selected from 0, 1, . . . , 8, 9 and X representing 10. Each ISBN consists of 
nine such digits a 1 , a 2 , . . . , a 9 followed by a single check digit a 10 chosen so 
that 

10cq T 9a 2 A • • • 2ttg T a id = 0 mod 11. ( 1 ) 

(In more sophisticated language our code C consists of those elements a G F]) 1 
such that Ej=i(H — j) a j = 0-) 

(i) Find a couple of books and check that (*) holds for their ISBNs. 

(ii) Show that (*) will not work if you make a mistake in writing down 
one digit of an ISBN. 

(iii) Show that (*) may fail to detect two errors. 

(iv) Show that (*) will not work if you interchange two adjacent digits. 
Errors of type (ii) and (iv) are the most common in typing. 

Q 15.3. Do Exercise 2.4 Suppose we use eight hole tape with the standard 
paper tape code and the probability that an error occurs at a particular 
place on the tape (i.e. a hole occurs where it should not or fails to occur 
where it should) is 10~ 4 . A program requires about 10 000 lines of tape (each 
line containing eight places) using the paper tape code. Using the Poisson 
approximation, direct calculation (possible with a hand calculator but really 
no advance on the Poisson method) or otherwise show that the probability 
that the tape will be accepted as error free by the decoder is less than .04%. 

Suppose now T that we use the Hamming scheme (making no use of the last 
place in each line). Explain why the program requires about 17500 lines of 
tape but that any particular line will be correctly decoded with probability 
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about 1 — (21 x 10 8 ) and the probability that the entire program will be 
correctly decoded is better than 99.6%. 

Q 15.4. Show that if 0 S 1/2 tlicro exists an 0 such that 

whenever 0 < r < nfi we have 

<)■ 

(We use weaker estimates in the course but this is the most illuminating. 

Q 15.5. Show that the n-fold repetition code is perfect if and only if n is 
odd. 

Q 15.6. Let C be the code consisting of the word 10111000100 and its cyclic 
shifts (that is 01011100010, 00101110001 and so on) together with the zero 
code word. Is C linear? Show that C has minimum distance 5. 

Q 15.7. Write down the weight enumerators of the trivial code, the repeti- 
tion code and the simple parity code. 

Q 15.8. List the codewords of the Hamming (7,4) code and its dual. Write 
down the weight enumerators and verify that they satisfy the Mac Williams 
identity. 

Q 15.9. (a) Show' that if C is linear then so are its extension C + , truncation 
C and puncturing C' provided the symbol chosen to puncture by is 0. 

(b) Show r that extension and truncation do not change the size of a code. 
Show' that it is possible to puncture a code without reducing the information 
rate. 

(c) Show' that the minimum distance of the parity extension C + is the 
least even integer n with n > d(C). Show r that the minimum distance of C 
is d(C) or d(C) — 1. Show r that puncturing does not change the minimum 
distance. 

Q 15.10. If Ci and C '2 are of appropriate type with generator matrices G\ 
and (?2 write down a generator matrix for C\ IW- 

Q 15.11. Show' that the weight enumerator of RM(d , 1) is 

<r \ C ' 1 nr <r : 
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Q 15.12. Do Exercise 3.6 which shows that even if 2 n /V(n, e ) is an integer, 
no perfect code may exist. 

(i) Verify that 


q90 

~ _ 9 78 

V (90, 2) “ ' 

(ii) Suppose that C is a perfect 2 error correcting code of length 90 and 
size 2 78 . Explain why we may suppose without loss of generality that 0 G C. 

(iii) Let C be as in (ii) with 0 G C. Consider the set 

X = {x G F;) 0 : Xi = 1, x 2 = 1, rf(0, x) = 3}. 

Show that corresponding to each x G X we can find a unique c(x) G C such 
that d(c(x),x) = 2. 

(iv) Continuing w T ith the argument of (iii) show T that 

d(c(x), 0) = 5 

and that q(x) = 1 whenever x t = 1. By looking at d(c(x), c(x 7 )) for x, x 7 G 
X and invoking the Dirichlet pigeon-hole principle, or otherwise, obtain a 
contradiction. 

(v) Conclude that there is no perfect [90, 2‘ 8 ] code. 
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16 Second Sheet of Exercises 


Because this is a third term course I have tried to keep the questions simple. 
On the whole Examples will have been looked at in the lectures and Exercises 
will not but the distinction is not very clear. 


Q 16.1. An erasure is a digit which has been made unreadable in transmis- 
sion. Why are they easier to deal with than errors? Find a necessary and 
sufficient condition on the parity check matrix of a linear (n,k) code for it 
to be able to correct t erasures and relate t to n and k in a useful manner. 

Q 16.2. Consider the collection K of polynomials 

(2q A U ] LO -}- Ol.^UJ 2 

with a.j G F 2 manipulated subject to the usual rules of polynomial arithmetic 
and the further condition 


1 + LO + UJ 2 = 0 . 

Show by direct calculation that F* = F \ {0} is a cyclic group under multi- 
plication and deduce that K is a finite field. 

[Of course, this follows directly from general theory but direct calculation is 
not uninstructive.] 

Q 16.3. (i) Identify the cyclic codes of length n corresponding to each of 
the polynomials 1, X — 1 and X n ~ 1 + X n ~ 2 + f 1 + 1. 

(ii) Show that there are three cyclic codes of length 7 corresponding to 
irreducible polynomials of which two are versions of Hamming’s original code. 
What are the other cyclic codes? 

(iii) Identify the dual codes for each of the codes in (ii). 

Q 16.4. Do Example 7.15. 

(i) If K is a field containing F 2 then (a + b ) 2 = a 2 + b 2 for all a, b G K . 

(ii) If P G F 2 [A'] and K is a field containing F 2 then P(a) 2 = P(a 2 ) for 
all a G A. 

(iii) Let K be a field containing F 2 in which X 7 — 1 factorises into linear 
factors. If / 3 is a root of A" 3 + X + 1 in K then (3 is a primitive root of unity 
and (3 2 is also a root of A' 3 + X + 1. 

(iv) We continue with the notation (iii). The BCH code with { 3. r ) as 
defining set is Hamming’s original (7,4) code. 
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Q 16.5. A binary non-linear feedback register of length 4 has defining rela- 
tion 


•^n-j-1 ^n^n-l A 3 - 

Show that the state space contains 4 cycles of lengths 1, 2, 4 and 9 

Q 16.6. A binary LSFR of length 5 was used to generate the following 
stream 


101011101100 . . . 

Recover the feedback polynomial by the Berlekamp- Massey method. 

Q 16.7. Do Exercise 8.5 Consider the linear recurrence 

Xn %n—d A d— 1 4 4 4 A &d— l*Ai— 1 ( ) 

with a,j G F 2 and a 0 ^ 0. 

(i) Suppose K is a field containing F 2 such that the auxiliary polynomial 
C has a root a in K . Then a n is a solution of (*) in K . 

(ii) Suppose K is a field containing F 2 such that the auxiliary polynomial 
C has d distinct roots a 2 , ... , a d in K. Then the general solution of (*) 
in K is 


d 

X n ^ ^ (Yj 
3=% 

for some bj G K. If x 0 , x 1 , . . . , x^-i G F 2 then x n G F 2 for all n. 

(iii) Work out the first few lines of Pascal’s triangle modulo 2. Show that 
the functions fj: Z ^ F 2 


fj ( n ) = 



are linearly independent in the sense that 

m 

^2 a jfj(n) = 0 

j = o 

for all n implies a,j = 0 for 1 < j < m. 
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(iv) Suppose K is a field containing F 2 such that the auxiliary polynomial 
C factorises completely into linear factors. If the root a u has multiplicity m u 
[1 < u < q ] then the general solution of (*) in K is 



for some b u G K. If xq, aq, . . . , x^~ 1 G F 2 then x n G F 2 for all n. 

Q 16.8. Do Exercise 10.8 One of the most confidential German codes (called 
FISH by the British) involved a complex mechanism which the British found 
could be simulated by two loops of paper tape of length 1501 and 1497 . If 
k n = x n + y n where x n is a stream of period 1501 and y n is stream of period 
1497 what is the longest possible period of k n ? How many consecutive values 
of k n do you need to to specify the sequence completely? 

Q 16.9. We work in F 2 . I have a secret sequence Aq, k 2 , ... and a message 
p | . p 2 • ... , Pn- I transmit p 1 + Aq, p 2 + k 2 , ... p# + k N and then, by error, 
transmit p\ + k 2 , p 2 + A' 3 , ... pi \r + k^+i- Assuming that you know this and 
that my message makes sense how would you go about finding my message? 
Can you now decipher other messages sent using the same part of my secret 
sequence? 

Q 16.10. Give an example of a homomorphism attack on an RSA code. 
Show in reasonable detail that the Elgamal signature scheme defeats it. 

Q 16.11. I announce that I shall be using the Rabin- Williams scheme with 
modulus N. My agent in X’Dofdro sends me a message m (with 1 > m < 
N — 1) encoded in the requisite form. Unfortunately my cat eats the piece 
of paper on which the prime factors of N are recorded so I am unable to 
decipher it. I therefore find a new pair of primes and announce that I shall 
be using the Rabin Williams scheme with modulus N 1 > N. My agent now 
recodes the message and sends it to me again. 

The dreaded SNDO of X’Dofdro intercept both code messages. Show that 
they can find m. Can they decipher any other messages sent to me using 
only one of the coding schemes? 

Q 16.12. Extend the Diffie-Helman key exchange system to cover three par- 
ticipants in a way that is likely to be as secure as the two party scheme. 

Extend the system to n parties in such a way that they can compute their 
common secret key in at most n 2 — n communications. 
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